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ETAPS Foreword 


Welcome to the 27th ETAPS! ETAPS 2024 took place in Luxembourg City, the 
beautiful capital of Luxembourg. 

ETAPS 2024 is the 27th instance of the European Joint Conferences on Theory and 
Practice of Software. ETAPS is an annual federated conference established in 1998, 
and consists of four conferences: ESOP, FASE, FoSSaCS, and TACAS. Each con- 
ference has its own Program Committee (PC) and its own Steering Committee (SC). 
The conferences cover various aspects of software systems, ranging from theoretical 
computer science to foundations of programming languages, analysis tools, and formal 
approaches to software engineering. Organising these conferences in a coherent, highly 
synchronized conference programme enables researchers to participate in an exciting 
event, having the possibility to meet many colleagues working in different directions in 
the field, and to easily attend talks of different conferences. On the weekend before the 
main conference, numerous satellite workshops took place that attracted many 
researchers from all over the globe. 

ETAPS 2024 received 352 submissions in total, 117 of which were accepted, 
yielding an overall acceptance rate of 33%. I thank all the authors for their interest in 
ETAPS, all the reviewers for their reviewing efforts, the PC members for their con- 
tributions, and in particular the PC (co-)chairs for their hard work in running this entire 
intensive process. Last but not least, my congratulations to all authors of the accepted 
papers! 

ETAPS 2024 featured the unifying invited speakers Sandrine Blazy (University of 
Rennes, France) and Lars Birkedal (Aarhus University, Denmark), and the invited 
speakers Ruzica Piskac (Yale University, USA) for TACAS and Jéróme Leroux 
(Laboratoire Bordelais de Recherche en Informatique, France) for FoSSaCS. Invited 
tutorials were provided by Tamar Sharon (Radboud University, the Netherlands) on 
computer ethics and David Monniaux (Verimag, France) on abstract interpretation. 

As part of the programme we had the first ETAPS industry day. The goal of this day 
was to bring industrial practitioners into the heart of the research community and to 
catalyze the interaction between industry and academia. The day was organized by 
Nikolai Kosmatov (Thales Research and Technology, France) and Andrzej Wasowski 
(IT University of Copenhagen, Denmark). 

ETAPS 2024 was organized by the SnT - Interdisciplinary Centre for Security, 
Reliability and Trust, University of Luxembourg. The University of Luxembourg was 
founded in 2003. The university is one of the best and most international young 
universities with 6,000 students from 130 countries and 1,500 academics from all over 
the globe. The local organisation team consisted of Peter Y.A. Ryan (general chair), 
Peter B. Roenne (organisation chair), Maxime Cordy and Renzo Gaston Degiovanni 
(workshop chairs), Magali Martin and Isana Nascimento (event manager), Marjan 
Skrobot (publicity chair), and Afonso Arriaga (local proceedings chair). This team also 
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organised the online edition of ETAPS 2021, and now we are happy that they agreed to 
also organise a physical edition of ETAPS. 

ETAPS 2024 is further supported by the following associations and societies: 
ETAPS e.V., EATCS (European Association for Theoretical Computer Science), 
EAPLS (European Association for Programming Languages and Systems), and EASST 
(European Association of Software Science and Technology). 

The ETAPS Steering Committee consists of an Executive Board, and representa- 
tives of the individual ETAPS conferences, as well as representatives of EATCS, 
EAPLS, and EASST. The Executive Board consists of Marieke Huisman (Twente, 
chair), Andrzej Wasowski (Copenhagen), Thomas Noll (Aachen), Jan Kofroi (Prague), 
Barbara Kónig (Duisburg), Arnd Hartmanns (Twente), Caterina Urban (Inria), Jan 
Křetínský (Munich), Elizabeth Polgreen (Edinburgh), and Lenore Zuck (Chicago). 

Other members of the steering committee are: Maurice ter Beek (Pisa), Dirk Beyer 
(Munich), Artur Boronat (Leicester), Luis Caires (Lisboa), Ana Cavalcanti (York), 
Ferruccio Damiani (Torino), Bernd Finkbeiner (Saarland), Gordon Fraser (Passau), 
Arie Gurfinkel (Waterloo), Reiner Háhnle (Darmstadt), Reiko Heckel (Leicester), 
Marijn Heule (Pittsburgh), Joost-Pieter Katoen (Aachen and Twente), Delia Kesner 
(Paris), Naoki Kobayashi (Tokyo), Fabrice Kordon (Paris), Laura Kovacs (Vienna), 
Mark Lawford (Hamilton), Tiziana Margaria (Limerick), Claudio Menghi (Hamilton 
and Bergamo), Andrzej Murawski (Oxford), Laure Petrucci (Paris), Peter Y.A. Ryan 
(Luxembourg), Don Sannella (Edinburgh), Viktor Vafeiadis (Kaiserslautern), Stepha- 
nie Weirich (Pennsylvania), Anton Wijs (Eindhoven), and James Worrell (Oxford). 

I would like to take this opportunity to thank all authors, keynote speakers, atten- 
dees, organizers of the satellite workshops, and Springer Nature for their support. 
ETAPS 2024 was also generously supported by a RESCOM grant from the Luxem- 
bourg National Research Foundation (project 18015543). I hope you all enjoyed 
ETAPS 2024. 

Finally, a big thanks to both Peters, Magali and Isana and their local organization 
team for all their enormous efforts to make ETAPS a fantastic event. 


April 2024 Marieke Huisman 
ETAPS SC Chair 
ETAPS e.V. President 


Preface 


This three-volume proceedings contains the papers presented at the 30th International 
Conference on Tools and Algorithms for the Construction and Analysis of Systems 
(TACAS 2024). TACAS 2024 was part of the 27th European Joint Conferences on 
Theory and Practice of Software (ETAPS 2024), which was held between April 6-11, 
2024, in Luxembourg City, Luxembourg. 

TACAS is a forum for researchers, developers and users interested in rigorous tools 
and algorithms for the construction and analysis of systems. The conference aims to 
bridge the gaps between different communities with this common interest and to 
support them in their quest to improve the utility, reliability, flexibility, and efficiency 
of tools and algorithms for building systems. TACAS 2024 interleaves and integrates 
various disciplines, including formal verification of software and hardware systems, 
static analysis, probabilistic programming, program synthesis, concurrency, testing, 
simulations, verification of machine learning/autonomous systems, Cyber-Physical 
Systems, SAT/SMT solving, automated and interactive theorem proving, and proof 
checking. 

There were four submission categories for TACAS 2024: 


. Regular research papers identifying and justifying a principled advance to the 

theoretical foundations for the construction and analysis of systems. 

2. Case study papers describing the application of techniques developed by the 
community to a single problem or a set of problems of practical importance, 
preferably in a real-world setting. 

3. Regular tool papers presenting a novel tool or a new version of an existing tool 
built using novel algorithmic and engineering techniques. 

4. Tool demonstration papers demonstrating a new tool or application of an existing 

tool on a significant case-study. 


Regular research, case study, and regular tool paper submissions were restricted to 
16 pages, whereas tool demonstration papers to 6 pages, excluding the bibliography 
and appendices. 

TACAS 2024 received 159 submissions, consisting of 114 regular research papers, 
10 case study papers, 28 regular tool papers, and 7 tool demonstration papers. Each 
submission was assigned for review to at least three Program Committee (PC) mem- 
bers, who made use of subreviewers. Regular research papers were reviewed in double- 
blind mode, whereas case study, regular tool, and tool-demonstration papers were 
reviewed using a single-blind reviewing process. 

Similarly to previous years, it was possible to submit an artifact alongside a paper. 
Artifact submission was mandatory for regular tool and tool demo papers, and vol- 
untary for regular research and case study papers at TACAS 2024. An artifact might 
consist of a tool, models, proofs, or other data required for validation of the results 
of the paper. The Artifact Evaluation Committee (AEC) was tasked with reviewing the 
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artifacts, based on their documentation, ease of use, and, most importantly, whether the 
results presented in the corresponding paper could be accurately reproduced. Most 
of the evaluation was carried out using a standardized virtual machine to ensure 
consistency of the results, except for those artifacts that had special hardware or 
software requirements. Artifact evaluation at TACAS 2024 consisted of two rounds. 
The first round implemented the mandatory artifact evaluation of regular tool and tool 
demonstration papers; this round was carried out in parallel with the work of the PC. 
The judgment of the AEC was communicated to the PC and weighed in their dis- 
cussion. The second round of artifact evaluation carried out the voluntary artifact 
evaluation of regular research and case study papers, and took place after paper 
acceptance notifications were sent out; authors of accepted regular research and case 
study papers were able to update and revise their respective artifacts before artifact 
evaluation started. In both rounds, the AEC provided 3 reviews per artifact and 
anonymously communicated with the authors to resolve apparent technical issues. In 
total, 104 artifacts were submitted and the AEC evaluated a total of 62 artifacts 
regarding their availability, functionality, and/or reusability. Papers with an artifact that 
were successfully evaluated include one or more badges on the first page, certifying the 
respective properties. 

Selected papers were requested to provide a rebuttal in case a PC review gave rise to 
questions. Using the review reports and rebuttals, the PC had a thorough discussion on 
each paper. For regular tool and tool demonstration papers, the PC also discussed the 
corresponding artifact, using the AEC recommendations. As a result, the PC decided to 
accept 53 papers, out of which there were 35 regular research papers, 11 regular tool 
papers, 3 case study papers, and 4 tool demonstration papers. This corresponds to an 
overall acceptance rate of 3396. Each accepted paper at TACAS 2024 had either all 
positive reviews and/or a “championing” PC member who argued in favor of accepting 
the paper. All accepted papers at TACAS 2024 had a positive average review score. 

TACAS 2024 also hosted SV-COMP 2024, the 13th International Competition on 
Software Verification. This event to compare tools evaluated 59 software systems for 
automatic verification of C and Java programs and 17 software systems for witness 
validation. The TACAS 2024 proceedings contains a competition report by the SV- 
Comp chair and organizer. From the 46 actively participating teams, the SV-Comp jury 
selected 16 short papers that describe the participating verification and validation 
systems. These 16 short papers are also published in the proceedings and were 
reviewed by a separate program committee (jury); each of these short papers was 
assessed by at least four jury members. Two sessions in the TACAS 2024 program 
were reserved for the presentation of the results: (1) a presentation session with a report 
by the competition chair and summaries by the developer teams of participating tools, 
and (2) an open community meeting in the second session. 

We would like to thank everyone who helped to make TACAS 2024 successful. We 
thank the authors for submitting their papers to TACAS 2024. The PC members and 
additional reviewers did an excellent job in reviewing papers: they provided detailed 
reports and engaged in the PC discussions. We thank the TACAS steering committee, 
and especially its chair, Joost-Pieter Katoen, for his valuable advice. We are grateful to 
the ETAPS steering committee, and in particular its chair, Marieke Huisman, for 
supporting our changes and suggestions on the TACAS 2024 review process and final 
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program. We also acknowledge the invaluable support provided by the EasyChair 
developers. Lastly, we would like to thank the overall organization team of ETAPS 
2024. 
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Abstract. Neural network verification mainly focuses on local robust- 
ness properties, which can be checked by bounding the image (set of 
outputs) of a given input set. However, often it is important to know 
whether a given property holds globally for the input domain, and if not 
then for what proportion of the input the property is true. To analyze 
such properties requires computing preimage abstractions of neural net- 
works. In this work, we propose an efficient anytime algorithm for gener- 
ating symbolic under-approximations of the preimage of any polyhedron 
output set for neural networks. Our algorithm combines a novel tech- 
nique for cheaply computing polytope preimage under-approximations 
using linear relaxation, with a carefully-designed refinement procedure 
that iteratively partitions the input region into subregions using input 
and ReLU splitting in order to improve the approximation. Empirically, 
we validate the efficacy of our method across a range of domains, includ- 
ing a high-dimensional MNIST classification task beyond the reach of 
existing preimage computation methods. Finally, as use cases, we show- 
case the application to quantitative verification and robustness analysis. 
We present a sound and complete algorithm for the former, which ex- 
ploits our disjoint union of polytopes representation to provide formal 
guarantees. For the latter, we find that our method can provide useful 
quantitative information even when standard verifiers cannot verify a 
robustness property. 


1 Introduction 


Despite the remarkable empirical success of neural networks, guaranteeing their 
correctness, especially when using them as decision-making components in safety- 
critical autonomous systems [7,13, 43], is an important and challenging task. 
Towards this aim, various approaches have been developed for the verification 
of neural networks, with extensive effort devoted to local robustness verifica- 
tion [20, 22, 44, 11, 35, 32, 40, 41,36]. While local robustness verification focuses 
on deciding the absence of adversarial examples within an e-perturbation neigh- 
bourhood, an alternative approach for neural network analysis is to construct 
the preimage of its predictions [27,15]. Given a set of outputs, the preimage is 
defined as the set of all inputs mapped by the neural network to that output set. 
By characterizing the preimage symbolically in an abstract representation, e.g., 
© The Author(s) 2024 
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polyhedra, one can perform more complex analysis for a wider class of properties 
beyond local robustness, such as computing the proportion of inputs satisfying a 
property (quantitative verification) even if standard robustness verification fails. 

Exact preimage generation [27] is intractable, taking time exponential in the 
number of neurons in a network; thus approximations are necessary. Unfortu- 
nately, existing methods are limited in their applicability. The inverse abstrac- 
tion method in [15] bypasses the intractability of exact preimage generation by 
leveraging symbolic interpolants [14, 2] for abstraction of neural network layers. 
However, due to the complexity of interpolation, the time to compute the ab- 
straction also scales exponentially with the number of neurons in hidden layers. 
A concurrent work [23] proposed an input bounding algorithm targeting back- 
ward reachability analysis for control policies and out-of-distribution (OOD) 
detection in low-dimensional domains. Their method produces a preimage over- 
approximation, which cannot be used for quantitative verification. Therefore, 
more efficient and flexible computation methods for (symbolic abstraction of) 
preimages of neural networks are needed. 

The main contribution of this paper is a scalable method for preimage ap- 
proximation, which can be used for a variety of robustness analysis tasks. More 
specifically, we propose an efficient anytime algorithm for generating symbolic 
under-approximations of the preimage of piecewise linear neural networks as a 
union of disjoint polytopes. The algorithm computes a sound preimage under- 
approximation leveraging linear relaxation based perturbation analysis (LiRPA) 
[40, 41, 32], applied backwards from a polyhedron output set. It iteratively re- 
fines the preimage approximation by adding input and/or intermediate (ReLU) 
splitting (hyper)planes to partition the input region into disjoint subregions, 
which can be approximated independently in parallel in a divide-and-conquer 
approach. The refinement scheme uses a novel differential objective to optimize 
the quality (volume) of the polytope subregions. We also show that our method 
can be generalized to generate preimage over-approximations. We illustrate the 
application of our method to quantitative verification, input bounding for control 
tasks, and robustness analysis against adversarial and patch attacks. Finally, we 
conduct an empirical analysis on a range of control and computer vision tasks, 
showing significant gains in efficiency compared to exact preimage generation 
methods and scalability to high-input-dimensional tasks compared to existing 
preimage approximation methods. 

For space reasons, proofs and additional technical details have been moved 
to Appendix of the full version of the paper [45]. 


2 Preliminaries 


We use f : Rt + R™ to denote a feedforward neural network. For layer i, we use 
W? to denote the weight matrix, b™ the bias, h the pre-activation neurons, 
and a the post-activation neurons, such that we have h® = Wa) 4p, 
In this paper, we focus on ReLU neural networks with a (x) = ReLU(h® (2)), 


Provable Preimage Under-Approximation for Neural Networks 5 


Fig. 1: Linear bounding functions for inactive, active, unstable ReLU neurons. 


where ReLU(h) := max(h, 0) is applied element-wise. However, our method can 
be generalized to other activation functions bounded by linear relaxation [44]. 


Linear Relaxation of Neural Networks. Nonlinear activation functions 
lead to the NP-completeness of the neural network verification problem [22]. 
To address such intractability, linear relaxation is often used to transform the 
nonconvex constraints into linear programs. As shown in Figure 1, given concrete 
lower and upper bounds 1 < h(x) < u(? on the pre-activation values of layer 


i, there are three cases to consider. In the inactive (us? < 0) and active x > 0) 


cases, the post-activation neurons at ) (a ) are linear functions af ) (a: ) 2 0 and 


aC (x) = ni! (x) respectively. In the unstable case, af ) (ar) can be bounded by 


j 
n ud ud "" 
aC hË (s ) < af? (a )< "2 As + ^m ghi (x), where af is a configurable 


parameter that produces a valid lower bound for any value in [0, 1]. Linear bounds 
can also be obtained for other non-piecewise linear activation functions [44]. 


Linear relaxation can be used to compute linear upper and lower bounds of 
the form Az +b < f(x) € Ax+b on the output of a neural network, for a given 
bounded input region C. These methods are known as linear relaxation based per- 
turbation analysis (LIRPA) algorithms [40, 41, 32]. In particular, backward-mode 
LiRPA computes linear bounds on f by propagating linear bounding functions 
backward from the output, layer-by-layer, to the input layer. 


Polytope Representations. Given an Euclidean space Rf, a polyhedron T 
is defined to be the intersection of a set of half spaces. More formally, suppose we 
have a set of linear constraints defined by v;(x) :— c? x +d; > 0 for i =1,...K, 
where c; € R, d; € R are constants, and x = 71,...,%q is a set of variables. Then 
a polyhedron is defined as T = (x € R24 pe wi(x)}, where T consists of all 
values of x satisfying the first-order logic (FOL) formula a(x) := Nes il£). We 
use the term polytope to refer to a bounded polyhedron, that is, a polyhedron T 
such that IR € R>? : Vr, x2 € T, ||xi — za|| € R holds. The abstract domain 
of polyhedra [32, 6, 8| has been widely used for the verification of neural networks 
and computer programs. An important type of polytope is the hyperrectangle 
(box), which is a polytope defined by a closed and bounded interval [r;, ;] for 
each dimension, where z;,z; € Q. More formally, using the linear constraints 
Qi :— (mi 2 xi) ^ (zx; € Ti) for each dimension, the hyperrectangle takes the 
form C = {x € R?|z & NA Qi}. 
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3 Problem Formulation 


3.1 Preimage Approximation 


In this work, we are interested in the problem of computing preimages for neural 
networks. Given a subset O C R™ of the codomain, the preimage of a function 
f : R? — R” is defined to be the set of all inputs x € R? that are mapped to 
an element of O by f. For neural networks in particular, the input is typically 
restricted to some bounded input region C C R2. In this work, we restrict the 
output set O to be a polyhedron, and the input set C to be an axis-aligned 
hyperrectangle region C C R2, as these are commonly used in neural network 
verification. We now define the notion of a restricted preimage: 


Definition 1 (Restricted Preimage). Given a neural network f : R* — R”, 


and an input set C C R¢, the restricted preimage of an output set O C R™ is 
defined to be the set f; (O) := (x € R4| f(z) CONE Ch. 


Example 1. To illustrate our problem formulation and approach, we introduce a 
vehicle parking task [3] as a running example. In this task, there are four parking 
lots, located in each quadrant of a 2 x 2 grid [0, 2]?, and a neural network with 
two hidden layers of 10 ReLU neurons f : R? — R4 is trained to classify which 
parking lot an input point belongs to. To analyze the behaviour of the neural 
network in the input region [0, 1] x [0, 1] corresponding to parking lot 1, we set 
C = {x € R?|(0 < z; € 1)A(0 € z2 € 1)). Then the restricted preimage fc '(O) 
of the set O = {y € R?| Nie{2,3,44 V1 — Yi 2 0} is the subspace of the region 
[0, 1] x [0, 1] that is labelled as parking lot 1 by the network. 


We focus on provable approximations of the preimage. Given a first-order 
formula A, a is an under-approzimation (resp. over-approrimation) of A if it 
holds that Vr.a(x) = A(x) (resp. Vz.A(z) = > a(x)). In our context, the 
restricted preimage is defined by the formula A(x) = (f(x) € O) ^ (x € C), 
and we restrict to approximations o that take the form of a disjoint union of 
polytopes (DUP). The goal of our method is to generate a DUP approximation 
T that is as tight as possible; that is, to maximize the volume vol(7) of an 
under-approximation, or minimize the volume vol(7) of an over-approximation. 


Definition 2 (Disjoint Union of Polytopes). A disjoint union of polytopes 
(DUP) is a FOL formula a of the form a(x) :— MS o4(z), where each a; is 
a polytope formula (conjunction of a finite set of linear half-space constraints), 
with the property that a; ^ a; is unsatisfiable for any i £ j. 


3.2 Quantitative Properties 


One of the most important verification problems for neural networks is that of 
proving guarantees on the output of a network for a given input set [18, 19, 30]. 
This is often expressed as a property of the form (J,O) such that Vz € I => 
f(x) € O. We can generalize this to quantitative properties: 
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Definition 3 (Quantitative Property). Given a neural network f : Rt > 

IR", a measurable input set with non-zero measure (volume) I C R*, a measur- 

able output set O C IR", and a rational proportion p € [0,1] we say that the 
-1 

neural network satisfies the property (1, O,p) if 2 >p.} 

Neural network verification algorithms [25] can be divided into two categories: 
sound, which always return correct results, and complete, guaranteed to reach a 
conclusion on any verification query. We now define soundness and completeness 
of verification algorithms for quantitative properties. 


Definition 4 (Soundness). A verification algorithm QV is sound if, whenever 
QV outputs True, the property (I,O,p) holds. 


Definition 5 (Completeness). A verification algorithm QV is complete if (i) 
QV never returns Unknown, and (ii) whenever QV outputs False, the property 
(1, O, p) does not hold. 


If the property (1,O) holds, then the quantitative property (J,O,1) holds, 
while quantitative properties for 0 < p < 1 provide more information when 
(1,O) does not hold. Most neural network verification methods produce ap- 
proximations of the image of I in the output space, which cannot be used to 
verify quantitative properties. Preimage over-approximations include false re- 
gions, thereby not applicable for quantitative verification. In contrast, preimage 
under-approrimations provide a lower bound on the volume of the preimage, 
allowing us to soundly verify quantitative properties. 


4 Methodology 


Overview. In this section we present the main components of our methodology. 
Firstly, in Section 4.1, we show how to cheaply and soundly under-approximate 
the (restricted) preimage with a single polytope, using linear relaxation meth- 
ods (Algorithm 2). Secondly, in Section 4.2, we propose a novel differentiable 
objective to optimize the quality (volume) of the polytope under-approximation. 
Thirdly, in Section 4.3, we propose a refinement scheme that improves the ap- 
proximation by partitioning a (sub)region into subregions with splitting planes, 
with each subregion then being under-approximated more accurately. The main 
contribution of this paper (Algorithm 1) integrates these three components and 
is described in Section 4.4. Finally, in Section 4.5, we apply our method to quan- 
titative verification (Algorithm 3) and prove its soundness and completeness. 


4.1 Polytope Under-Approximation via Linear Relaxation 


We first show how to adapt linear relaxation techniques to efficiently generate 
valid under-approximations to the restricted preimage for a given input region C. 


In particular, the restricted preimage of a polyhedron under a neural network is 
Lebesgue measurable since polyhedra (intersection of a finite set of half-spaces) are 
Borel measurable and NNs are continuous functions. 
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Algorithm 1: Preimage Approximation 
Input: Neural network f, Input region C, Output region O, Volume threshold 
v, Maximum iterations R, Boolean SplitOnInput 
Output: Disjoint union of polytopes T 
1TH GenUnderApprox(C, O); // Initial preimage polytope 
2 volr, vol,- =1(9) — EstimateVol(T), EstimateVol( fc ! (O)) ; 


3 Dom — (C, L vol,- 1(o) — volr)} ; // Priority queue 


// TbDom is the union of polytopes in Dom 
4 while EstimateVol(7pom) < v and Iterations < R do 


5 Csub, T, Priority + Pop(Dom) ; // Subregion with highest priority 

6 if SplitOnInput then 

7 id «— SelectInputFeature(Featurer) ; // Feature; is the set of 
input features/dimensions 

8 else 

9 id + SelectReLUNode(Nodez); // Nodez is the set of unstable 
ReLU nodes 

10 CL,,,C2,,] € SplitOnNode(Csuv, id); // Split on the selected node 

11 [T', T"|* «€ GenUnderApprox([Cl,,,C2,,], O) ; // Generate preimage 

12 [vol volre] < + EstimateVol([T", T"*]); 

13 voly-i cop E gio ) — EstimateVol( fo" S JN, EstimateVol( fou , (O)) ; 


14 Dom — Dom U ((c DT vol,-: (0) ~ vol)} U 
$ 
sub 
Chin" vol- (0) — volru)}; // Disjoint polytope 


ub 


15 return Jpom 


Recall that LiRPA methods enable us to obtain linear lower and upper bounds 
on the output of a neural network f, that is, Az +b < f(x) < Ax +b, where 
the linear coefficients depend on the input region C. 

Now, suppose that we are interested in computing an under-approximation 
to the restricted preimage, given the input hyperrectangle C = {x € R?|x H 


Ni $i}, and the output polytope specified using the half-space constraints 
wily) = (chy +d; > 0) for i =1,..., K over the output space. Given a constraint 
WPi, we append an additional linear layer at the end of the network f, which maps 
y — cL y+dj, such that the function g; : IR? — R represented by the new network 
is g;(z) = cT f(x) + di. Then, applying LiRPA bounding to each gi, we obtain 
lower bounds g;(z) = aZ x +b; for each i, such that g;(z) > 0 => gi(x) > 0 for 
x € C. Notice that, for each i = 1,..., K, a? x +b, > 0 is a half-space constraint in 
the input space. We conjoin these constraints, along with the restriction to the 
input region C, to obtain a polytope Tc(O) := I" Ajai (gil) = DANY 1 9i(2))- 


Proposition 1. 7c(O) is an under-approximation to the restricted preimage 


fc (0). 


Provable Preimage Under-Approximation for Neural Networks 9 


Algorithm 2: GenUnderApprox 


Input: List of subregions C, Output set O, number of samples N 
Output: List of polytopes T 


1 T=); 

2 for NN Csub € C // Parallel over subregions 

3 do 

4 [g1 (x, 0), -.-, gk (x, e. )] + LinearLowerBound(Csus, O); 

5 21,.., TN + Sample(Csus, N); 

6 Loss(@1,...,@K) < — oped wy o(—-LSE(—gi (xj, 01), .., gk (£j, ax )); 
7 a, ..., A < Optimize(Loss(ai, ..., ax )); 

8 T = Append(T, [gi (x, a7) = 0, ..., gK (£, €x) = 0,2 € Csue]) 

9 return T 


Example 2. Returning to Example 1, the output constraints (for i = 2,3,4) are 
given by v; = (yı — yi > 0) = (cP y +d; > 0), where cj : ei — e; (where 
e; is the it standard basis vector) and d; :— 0. Applying LiRPA bounding, 
we obtain the linear lower bounds go(x) = —3.79:z4 + x2 + 2.65 > 0;gs(x) = 
0.3421 — zə — 0.60 > 0;gi(z) = —1.11a1 — z2 + 1.99 > 0 for each constraint. 
The intersection of these constraints, shown in Figure 2a, represents the region 
where any input is guaranteed to satisfy the output constraints. 


We generate the linear bounds in parallel over the output polyhedron con- 
straints i = 1,..., K using the backward mode LiRPA [44], and store the resulting 
input polytope Tc(O) as a list of constraints. This highly efficient procedure is 
used as a sub-routine LinearLowerBound when generating a preimage under- 
approximation as a polytope union using Algorithm 2 (Line 4). 


4.2 Local Optimization 


One of the key components behind the effectiveness of LiRPA-based bounds 
is the ability to efficiently improve the tightness of the bounding function by 
optimizing the relaxation parameters œ, via projected gradient descent. In the 
context of local robustness verification, the goal is to optimize the concrete lower 
or upper bounds over the (sub)region C [40], i.e., minzec A(o)z--b(a), where we 
explicitly note the dependence of the linear coefficients on œ. In our case, we are 
instead interested in optimizing « to refine the polytope under-approximation, 
that is, increase its volume. Unfortunately, computing the volume of a polytope 
exactly is a computationally expensive task, and requires specialized tools [12] 
that do not permit easy optimization with respect to the œ parameters. 

To address this challenge, we propose to use statistical estimation. In par- 
ticular, we sample N points z,...,zwN uniformly from the input domain C then 
employ Monte Carlo estimation for the volume of the polytope approximation: 


N 
iat LaseTe (0) 
N 


vol(Te.a(O)) = x vol(C) (1) 
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where we highlight the dependence of Tc (O) = {x| Nu gi (XL, 0) > OANA, gi(x)} 
on a = (a1, ..., & x), and a; are the a-parameters for the linear relaxation of the 
neural network g; corresponding to the it half-space constraint in O. However, 
this is still non-differentiable w.r.t. œ due to the identity function. We now show 
how to derive a differentiable relaxation which is amenable to gradient-based 
optimization: 


Ea ui vol(C) X 
vol(T¢,a(O)) = DD le; ETe a (0) = N 5 lminii gi(z;,0;)20 


j= j=l 


Q 
< 
o 
*[s 
M= 
fe] 
a 
E 
B 
Ss 
* 
Rg 
T 


Q 


E N 
SO Cassica tte) 


The second equality follows from the definition of the polytope Te 4 (O); namely 
that a point is in the polytope if it satisfies g;(7;,a;) > 0 for all i = 1,..., K, 
or equivalently, min;=1,...K gi(z, ai) > 0. After this, we approximate the iden- 
tity function using a sigmoid relaxation, where o(y) :— re as is commonly 
done in machine learning to define classification losses. Finally, we approximate 
the minimum over specifications using the log-sum-exp (LSE) function. The log- 
sum-exp function is defined by LSE(y1,....yx) :— log% ;=1,... x €”), and is 
a differentiable approximation to the maximum function; we employ it to ap- 
proximate the minimization by adding the appropriate sign changes. The final 
expression is now a differentiable function of œ. We employ this as the loss 
function in Algorithm 2 (Line 6) for generating a polytope approximation, and 
optimize volume using projected gradient descent. 


Example 3. We revisit the vehicle parking problem in Example 1. Figure 2a and 
2b show the computed under-approximations before and after local optimization. 
We can see that the bounding planes for all three specifications are optimized, 
which effectively improves the approximation quality. 


4.3 Global Branching and Refinement 


As LiRPA performs crude linear relaxation, the resulting bounds can be quite 
loose even with a-optimization, meaning that the polytope approximation Tc (O) 
is unlikely to constitute a tight under-approximation to the preimage. To address 
this challenge, we employ a divide-and-conquer approach that iteratively refines 
our under-approximation of the preimage. Starting from the initial region C 
represented at the root, our method generates a tree by iteratively partitioning 
a subregion Csub represented at a leaf node into two smaller subregions bs Cp 
which are then attached as children to that leaf node. In this way, the subregions 
represented by all leaves of the tree are disjoint, such that their union is the initial 
region C. 
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For each leaf subregion Csup we compute, using LiRPA bounds (Line 4, Algo- 
rithm 2), an associated polytope that under-approximates the preimage in Csub- 
Thus, irrespective of the number of refinements performed, the union of the poly- 
topes corresponding to all leaves forms an anytime DUP under-approximation 
T to the preimage in the original region C. The process of refining the subregions 
continues until an appropriate termination criterion is met. 

Unfortunately, even with a moderate number of input dimensions or un- 
stable ReLU nodes, naively splitting along all input- or ReLU-planes quickly 
becomes computationally infeasible. For example, splitting a d-dimensional hy- 
perrectangle using bisections along each dimension results in 27 subdomains to 
approximate. It thus becomes crucial to identify the subregion splits that have 
the most impact on the quality of the under-approximation. Another important 
aspect is how to prioritize which leaf subregion to split. We describe these in 
turn. 

Subregion Selection. Searching through all leaf subregions at each itera- 
tion is computationally too expensive. Thus, we propose a subregion selection 
strategy that prioritizes splitting subregions Becordins to (an estimate of) the 
difference in volume between the exact preimage te ,(Q) and the (already com- 
puted) polytope approximation Te,,, (O) on that subdomain, that is: 


Priority (Csu) = vol( fe}, (O)) — vol(Te,,,,(O)) (2) 


which measures the gap between the polytope under-approximation and the 
optimal approximation, namely, the preimage itself. 

Suppose that a particular leaf subdomain attains the maximum of this metric 
among all leaves, and we partition it into two subregions C!,,,,,C“,,, which we ap- 
proximate with O Te: (O), Tex (O). As tighter intermediate concrete 
bounds, and thus linear bounding functions, can be computed on the partitioned 
subregions, the polytope approximation on each subregion will be refined com- 


pared with the single polytope restricted to that subregion. 


Proposition 2. Given any subregion Csu with polytope approximation Te... (O), 
and its children C!,,,C%, with polytope approximations Teo: (QO), Tex (O) re- 
spectively, it holds that: 


Ter „(O) U Tes, (0) 2 Tesu (O) (3) 


Corollary 1. In each refinement iteration, the volume of the polytope approxi- 
mation Tpom does not decrease. 


Since computing the volumes in Equation 2 is intractable, we sample N 
points z;,..,xw uniformly from the subdomain Csub and employ Monte Carlo 
estimation to estimate the volume for both the preimage and the polytope ap- 
proximation using the same set of samples, i.e., vol( fz}, (O)) = vol(Csub) x 


N — N 
X Sy Leg NL and vol(Te s (O)) = vol(Csub) x + Jo 1, ,€Te, (0) We 
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(a) No optimization (b) After optimization (c) Input split (d) ReLU split 


Fig. 2: Refinement and optimization for preimage approximation. 


stress that volume estimation is only used to prioritize subregion selection, and 
does not affect the soundness of our method. 

Input Splitting. Given a subregion (hyperrectangle) defined by lower and 
upper bounds z; € [v;, z;] for all dimensions i = 1, ..., d, input splitting partitions 
it into two subregions by cutting along some feature i. This splitting procedure 
will produce two subregions which are similar to the original subregion, but have 
updated bounds [z;, mp) a, T;] for feature i instead. In order to determine 
which feature/dimension to split on, we propose a greedy strategy. Specifically, 
for each feature, we generate a pair of polytopes for the two subregions resulting 
from the split, and choose the feature that results in the greatest total volume 
of the polytope pair. In practice, another commonly-adopted splitting heuristic 
is to select the dimension with the longest edge [10], that is, to select feature i 
with the largest range: arg max; (1; —2;). However, this method falls short in per- 
iteration approximation volume improvement compared to our greedy strategy. 


Example 4. We revisit the vehicle parking problem in Example 1. Figure 2b 
shows the polytope under-approximation computed on the input region C before 
refinement, where each solid line represents the bounding plane for each output 
specification (yi — y; > 0). Figure 2c depicts the refined approximation by split- 
ting the input region along the vertical axis, where the solid and dashed lines 
represent the bounding planes for the two resulting subregions. It can be seen 
that the total volume of the under-approximation has improved significantly. 


Intermediate ReLU Splitting. Refinement through splitting on input fea- 
tures is adequate for low-dimensional input problems such as reinforcement learn- 
ing agents. However, it may be infeasible to generate sufficiently fine subregions 
for high-dimensional domains. We thus propose an algorithm for ReLU neural 
networks that uses intermediate ReLU splitting for preimage refinement. After 
determining a subregion for refinement, we partition the subregion based upon 
the pre-activation value of an intermediate unstable neuron y" = 0. As a re- 
sult, the original subregion Csu» is split into two new subregions Cho = {x€ 


Cala =n emd C55 — (66 Gus] = @) <0} 


? To obtain a polytope under-approximation, we can utilize linear lower /upper bounds 
on a (x) as an approximation to the subregion boundary. 
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In this procedure, the order of splitting unstable ReLU neurons can greatly 
influence the refinement quality and efficiency. Existing heuristic methods of 
ReLU prioritization select ReLU nodes that lead to greater improvement in the 
final bound (maximum or minimum value) of the neuron network on the input 
domain [10], i.e. minzec f(x). However, these ReLU prioritization methods are 
not effective for preimage analysis, because our objective is instead to refine 
the overall preimage approximation. We thus propose a heuristic method to pri- 
oritize unstable ReLU nodes for preimage refinement. Specifically, we compute 
(an estimate of) the volume difference between the split subregions |vol(C*..,) — 


Jj 
vol(C ,,)|, using a single forward pass for a set of sampled datapoints from the 
zy 


input domain; note that this is bounded above by the total subregion volume 
vol(Csub). We then propose to select the ReLU node that minimizes this differ- 
ence. Intuitively, this choice results in balanced subdomains after splitting. 

Another advantage of ReLU splitting is that we can replace the unstable 
neuron bound ch (x)+d< at (zx) € cn O (x) +d with the exact linear function 
al (x) = p (x) and aC (x) — 0, respectively, as shown in Figure 1 (unstable 
to stable). This can then tighten the linear bounds for the other neurons, thus 
tightening the under-approximation on each subdomain. 


Example 5. We now apply our algorithm with ReLU splitting to the vehicle 
parking problem in Example 1. Figure 2d shows the refined preimage polytope 
by adding the splitting plane (black solid line) along the direction of a selected 
unstable ReLU node. Compared with Figure 2b, we can see that the volume of 
the approximation is improved. 


Remark 1 (Preimage Over-approximation). While Algorithms 1 and 2 focus on 
preimage under-approximations, they can be easily configured to generate over- 
approximations with two key modifications. Firstly, we generate polytope over- 
approximations by using LiRPA to propagate a linear upper bound G(x) = 
alg + b; for each output constraint, such that g(x) > 0 => gi(x) > 0 for 
x € C. Secondly, the refinement and optimization objective is to minimize the 
volume of the over-approximation instead of maximizing the volume as in the 
case of under-approximation. 


4.4 Overall Algorithm 


Our overall preimage approximation method is summarized in Algorithm 1. It 
takes as input a neural network f, input region C, output region O, target poly- 
tope volume threshold v (a proxy for approximation precision), termination itera- 
tion number £, and a Boolean indicating whether to use input or ReLU splitting, 
and returns a disjoint polytope union 7 representing an underapproximation to 
the preimage. 

The algorithm initiates and maintains a priority queue of (sub)regions ac- 
cording to Equation 2. The initialization step (Lines 1-3) generates an initial 
polytope approximation on the whole region using Algorithm 2 (Sections 4.1, 
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Algorithm 3: Quantitative Verification 


Input: Neural network f, Property (1, O, p), Maximum iterations R 
Output: Verification Result € {True, False, Unknown} 
1 vol(I) + ExactVolume(T); 
2 C + OuterBox(J) ; // For general polytopes I 
3 T + InitialRun(f,C, O); 
4 while Iterations < R do 


s | T Refine(f, T. C, 0); 

6 if EstimateVolume(7 ) > p x vol(I) then 
7 if ExactVolume(7) > p x vol(I) then 
8 | return True 

9 if AllReLUSplit then 
10 return False 


11 return Unknown 


4.2). Then, the preimage refinement loop (Lines 4-14) partitions a subregion in 
each iteration, with the preimage restricted to the child subregions then being 
re-approximated (Line 10-11). In each iteration, we choose the region to split 
(Line 5) and the splitting plane to cut on (Line 7 for input split and Line 9 for 
ReLU split), as detailed in Section 4.3. The preimage under-approximation is 
then updated by computing the priorities for each subregion (by approximating 
volumes) (Lines 12-14). The loop terminates and the approximation returned 
when the target volume threshold v or maximum iteration limit R is reached. 


4.5 Quantitative Verification 


We now show how to use our efficient preimage under-approximation method 
(Algorithm 1) to verify a given quantitative property ([,O,p), where O is a 
polyhedron, / a polytope and p the desired proportion value, summarized in 
Algorithm 3. To simplify assume that J is a hyperrectangle, so that we can take 
C — I (in view of space constraints the case of general polytopes is discussed in 
Appendix of [45]). We utilize Algorithm 1 by setting the volume threshold to 
vol(7 ) 
vol(I) 
ing the maximum number of iterations. However, the Monte Carlo estimates of 


volume cannot provide a sound guarantee that ZU > p. To resolve this prob- 


px vol(1), such that we have > p if the algorithm terminates before reach- 


lem, we propose to run exact volume computation [5] only when the Monte Carlo 
estimate reaches the threshold. If the exact volume vol(7) > p x vol(I), then the 
property is verified. Otherwise, we continue running the preimage refinement. 

In Algorithm 3, InitialRun generates an initial approximation to the preim- 
age as in Lines 1-3 of Algorithm 1, and Refine performs one iteration of approx- 
imation refinement (Lines 5-14). Termination occurs when we have verified or 
falsified the quantitative property, or when the maximum number of iterations 
has been exceeded. 
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Proposition 3. Algorithm 3 is sound for quantitative verification with input 
splitting. 


Proposition 4. Algorithm 8 is sound and complete for quantitative verification 
on piecewise linear neural networks with ReLU splitting. 


5 Experiments 


We have implemented our approach as a prototype tool ? for preimage approx- 
imation for polyhedron-type output sets/specifications. In this section, we per- 
form experimental evaluation of the proposed approach on a set of benchmark 
tasks and demonstrate its effectiveness in approximation generation and its ap- 
plication to quantitative analysis of neural networks. 


5.1 Benchmark and Evaluation Metric 


We evaluate our preimage analysis approach on a benchmark of reinforcement 
learning and image classification tasks. Besides the vehicle parking task [3] shown 
in the running example, we use the following (trained) benchmarks: (1) aircraft 
collision avoidance system (VCAS) [21] with 9 feed-forward neural networks 
(FNNs); (2) neural network controllers from VNN-COMP 2022 [1] for three 
reinforcement learning tasks (Cartpole, Lunarlander, and Dubinsrejoin) [9]; and 
(3) the neural network from VNN-COMP 2022 for MNIST classification. Details 
of the models and additional experiments can be found in Appendix of [45]. 

Evaluation Metric To evaluate the quality of the preimage approximation, 
we define the coverage ratio to be the ratio of volume covered to the volume 
of the exact preimage, i.e., cov(7 , fe (0) i= GAROJ Note that this is a 
normalized measure for assessing the quality of the approximation, as shown in 
Algorithm 3 when comparing with target coverage proportion p for termination 
of the refinement loop, and not as a measure for formal verification guarantees. In 
practice, we estimate vol( fg ! (O)) as vol(fz ! (O)) = vol(C) x + Ss 1;(2;)c0; 
where 2z4,...rzw are samples from C. In Algorithm 1, the target volume (stopping 


criterion) is set as v =r x vol(fz! (O), where r is the target coverage ratio. 


5.2 Evaluation 


Effectiveness in Preimage Approximation with Input Split We apply Al- 
gorithm 1 with input splitting to the input bounding problem for low-dimensional 
reinforcement learning tasks to evaluate its effectiveness. For comparison, we also 
run the exact preimage (Exact) [27] and preimage over-approximation (Invprop) 
[23, 24] methods. 

Vehicle Parking & VCAS. Table 1 presents experimental results on the vehicle 
parking and VCAS tasks. In the table, we show the number of polytopes (#Poly) 


3 The source code is at https: //github.com/Zhang-Xiyue/PreimageApproxForNNs. 
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Table 1: Performance comparison in preimage generation. 


Models | Exact | Invprop | Our 
|#Poly Time |Time Cov(%)|#Poly Time Cov(%) 


Vehicle (FNN 1 x 20)| 10  3110.979|2.642 92.1 4 1.175 95.7 
VCAS (FNN 1 x 21)| 131 6363.272 12 11.281 91.0 


'able 2: Performance of preimage approximation for reinforcement learning tasks. 


Task Property Config ##Poly|Cov(%)| Time 

Cartpole óe[-2,—1] | 8 82.0 | 8.933 
(FNN 2 64) (ye R?| y1 2 y2} ô € [-2, —0.5]| 17 75.5 | 14.527 
0 € [-2,0] 32 76.5 | 27.344 

Lunarlander ù € [-0.5,0] 38 15.5 34.311 
(FNN 2x 64) € R'|^ienusay Y2 2 yi}| $ € [71,0] 71 75.1 | 63.333 
ù € [-2, 0] 159 | 75.0 | 134.929 

Dubinsreoin {y € R?| Metza) yi > yi z, € [-0.1,0.1] 26 75.8 | 34.558 
(FNN 2 x 256) NA MES a lee € [-0.2,0.2]| 61 75.4 | 78.437 
ieles| 95 2 9i |. € [_0.3,0.3]| 1002 | 57.6 |1267.272 


in the preimage, computation time (Time(s)), and the approximate coverage ra- 
tio (Cov(%)) when the preimage approximation algorithm terminates with target 
coverage 90%. Compared with the exact method, our approach yields orders-of- 
magnitude improvement in efficiency. It can also characterize the preimage with 
much fewer (and also disjoint) polytopes (average reduction of 91.1% for VCAS). 

The Invprop method [23] cannot be directly applied as it computes preim- 
age over-approximations. We adapt it to produce an under-approximation by 
computing over-approximations for the complement of each output constraint; 
the resulting approximation is then the complement of a union of polytopes, 
rather than a DUP. On the 2D vehicle parking task, we find that the results 
(see Table 1) are comparable with ours in time and approximation coverage. 
Their implementation currently only supports two-dimensional input tasks [24]. 
While their algorithm, which employs input splitting, can in theory be extended 
to higher-dimensional tasks, a significant unaddressed technical challenge is in 
how to choose the input splits effectively in high dimensions. This is confounded 
by the fact that, to generate an under-approximation, we need separate runs of 
their algorithm for each output constraint. In contrast, our method naturally in- 
corporates a principled splitting and refinement strategy, and can also effectively 
employ ReLU splitting for further scalability, as we will show below. Our method 
can also be configured to generate over-approximations (Section 4.3, Remark 1). 

Neural Network Controllers. In this experiment, we consider preimage under- 
approximation for neural network controllers in reinforcement learning tasks. 
Note that [27] (Exact) is unable to deal with neural networks of these sizes and 
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Table 3: Refinement with ReLU split for MNIST (FNN 6 x 100) 
Ls, attack|#Poly|Cov(%)| Time ||Patch attack|#Poly|Cov(%)| Time 


0.05 2 100.0 | 3.107 || 3 x 3(center) 1 100.0 | 2.611 
0.07 247 75.2 |121.661|| 4 x 4(center) | 678 38.2 455.988 
0.08 522 7T5.1 |305.867]| 6 x 6(corner) 2 100.0 | 9.065 
0.09 733 16.5  |507.116]| 7 x 7(corner) T 84.2 | 10.128 


[23, 24] (Invprop) does not support these higher-dimensional input domains. Ta- 
ble 2 summarizes the experimental results. We evaluate Algorithm 1 with input 
split on a range of tasks/properties and configurations of the input region (e.g., 
angular velocity 6 for Cartpole). Empirically, for the same coverage ratio, our 
method requires a number of polytopes and time roughly linear in the input re- 
gion size, with the exception of Dubinsrejoin, where the larger number of output 
constraints and larger network size contribute to greater relaxation error. 
MNIST Preimage Approximation with ReLU Split Next, we evaluate the 
scalability of Algorithm 1 with ReLU splitting by applying it to MNIST image 
classifiers. To our knowledge, this is the first time preimage computation has 
been attempted for this challenging, high-dimensional task. 

Table 3 summarizes the evaluation results for two types of image attacks: 
lœ and patch attack. For Læ attacks, bounded perturbation noise is applied 
to all image pixels. The patch attack applies only to a smaller patch area but 
allows arbitrary perturbations covering the whole valid range [0,1]. The task is 
then to produce a DUP under-approximation of the perturbation region that 
is guaranteed to be classified correctly. For La attack, our approach generates 
a preimage approximation that achieves the targeted coverage of 75% for noise 
up to 0.08. Notice that, from e.g. 0.05 to 0.07, the volume of the input region 
increases by tens of orders of magnitude due to the high dimensionality. The 
fact that the number of polytopes and computation time remains manageable is 
due to the effectiveness of ReLU splitting. Interestingly, for the patch attack, we 
observe that the number of polytopes required increases sharply when increasing 
the patch size at the center of the image, while this is not the case for patches 
in the corners of the image. We hypothesize this is due to the greater influence 
of central pixels on the neural network output, and correspondingly a greater 
number of unstable neurons over the input perturbation space. 

Comparison with Robustness Verifiers We now illustrate empirically the 
utility of preimage computation in robustness analysis compared to robustness 
verifiers. Table 4 shows comparison results with a,3-CROWN, winner of the 
VNN competition [1]. We set the tasks according to the problem instances from 
VNN-COMP 2022 for local robustness verification (localized perturbation re- 
gions). For Cartpole, a, 9- CROWN can provide a verification guarantee (yes/no 
or safe/unsafe) for both of the problem instances. However, in the case where the 
robustness property does not hold, our method explicitly generates a preimage 
approximation in the form of a disjoint polytope union (where correct classi- 
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Table 4: Comparison with a robustness verifier. 

a, B-CROWN Our 

Result Time |Cov(%) #Poly Time 
Cartpole (0 € [—1.642, —1.546])| | yes 3.349 | 100.0 1 1.137 


Task 


Cartpole (0 € [—1.642, 0]) no 6.927 | 94.9 2 3.632 
MNIST (Leo 0.026) yes 3.415 | 100.0 1 2.649 
MNIST (Lo 0.04) unknown 267.139! 100.0 2 3.019 


fication is guaranteed), and covers 94.9% of the exact preimage. For MNIST, 
while the smaller perturbation region is successfully verified, a, 9- CROWN with 
tightened intermediate bounds by MIP solvers returns unknown with a timeout 
of 300s for the larger region. In comparison, our algorithm provides a concrete 
union of polytopes where the input is guaranteed to be correctly classified, which 
we find covers 10096 of the input region (up to sampling error). Note also (Table 
3) that our algorithm can produce non-trivial under-approximations for input 
regions far larger than a, 9- CROWN can verify. 


Quantitative Verification We now demonstrate the application of our preim- 
age generation framework to quantitative verification of the property (I, O, p); 
that is, to check whether f(x) € O for at least proportion p of input values x € I. 
'This leverages the disjointness of our approximation, such that we can exactly 
compute the volume covered by exactly computing the volume of each polytope. 


Vehicle Parking. We consider the quantitative property with input set J = 
{x € R? | x € [0,1], output set O = (y € Rt ALa yı — yi > 0}, and 
quantitative proportion p — 0.95. We use Algorithm 3 to verify this property, 
with iteration limit 1000. The computed under-approximation is a union of two 
polytopes, which takes 0.942s to reach the target coverage. We then compute 
the exact volume ratio of the under-approximation against the input region. 
The final quantitative proportion reached by our under-approximation is 95.2%, 
verifying the quantitative property. 

Aircraft Collision Avoidance. In this example, we consider the VCAS sys- 
tem and a scenario where the two aircraft have negative relative altitude from 
intruder to ownship (h € [—8000, 0]), the ownship aircraft has a positive climb- 
ing rate ha € [0,100] and the intruder has a stable negative climbing rate 
hpg = —30, and time to the loss of horizontal separation is t € [0, 40], which 
defines the input region I. For this scenario, the correct advisory is “Clear Of 
Conflict” (COC). We apply Algorithm 3 to verify the quantitative property where 
O = {y € R?| ae yı — yi; 2 0) and the proportion p = 0.9, with an iteration 
limit of 1000. The under-approximation computed is a union of 6 polytopes, 
which takes 5.620s to reach the target coverage. The exact quantitative propor- 
tion reached by the generated under-approximation is 90.896, which verifies the 
quantitative property. 
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6 Related Work 


Our paper is related to a series of works on robustness verification of neural 
networks. To address the scalability issues with complete verifiers [20, 22, 35] 
based on constraint solving, convex relaxation [31] has been used for develop- 
ing highly efficient incomplete verification methods [44, 39, 32, 40]. Later works 
employed the branch-and-bound (BaB) framework [11,10] to achieve complete- 
ness, using incomplete methods for the bounding procedure [41, 36, 17]. In this 
work, we adapt convex relaxation for efficient preimage approximation. Further, 
our divide-and-conquer procedure is analogous to BaB, but focuses on maxi- 
mizing covered volume rather than maximizing a function value. There are also 
works that have sought to define a weaker notion of local robustness known as 
statistical robustness [37,26], which requires that a proportion of points under 
some perturbation distribution around an input point are classified in the same 
way. Verification of statistical robustness is typically achieved by sampling and 
statistical guarantees [37, 4, 34, 42]. In this paper, we apply our symbolic approx- 
imation approach to quantitative analysis of neural networks, while providing 
exact quantitative rather than statistical guarantees [38]. 

Another line of related works considers deriving exact or approximate ab- 
stractions of neural networks, which are applied for explanation [33], verifica- 
tion [16, 29], reachability analysis [28], and preimage approximation [15, 23]. [15] 
leverages symbolic interpolants [2] for preimage approximations, facing expo- 
nential complexity in the number of hidden neurons. Concurrently, [23] employs 
Lagrangian dual optimization for preimage over-approximations. Our anytime 
algorithm, which combines convex relaxation with principled splitting strategies 
for refinement, is applicable for both under- and over-approximations. Their 
work may benefit from our splitting strategies to scale to higher dimensions. 


7 Conclusion 


We present an efficient and flexible algorithm for preimage under-approximation 
of neural networks. Our anytime method derives from the observation that linear 
relaxation can be used to efficiently produce under-approximations, in conjunc- 
tion with custom-designed strategies for iteratively decomposing the problem 
to rapidly improve the approximation quality. Unlike previous approaches, it is 
designed for, and scales to, both low and high-dimensional problems. Experi- 
mental evaluation on a range of benchmark tasks shows significant advantage in 
runtime efficiency and scalability, and the utility of our method for important 
applications in quantitative verification and robustness analysis. 
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Abstract. With the growing use of deep neural networks(DNN) in mis- 
sion and safety-critical applications, there is an increasing interest in 
DNN verification. Unfortunately, increasingly complex network struc- 
tures, non-linear behavior, and high-dimensional input spaces combine 
to make DNN verification computationally challenging. Despite tremen- 
dous advances, DNN verifiers are still challenged to scale to large ver- 
ification problems. In this work, we explore how the number of stable 
neurons under the precondition of a specification gives rise to verifica- 
tion complexity. We examine prior work on the problem, adapt it, and 
develop several novel approaches to increase stability. We demonstrate 
that neuron stability can be increased substantially without compromis- 
ing model accuracy and this yields a multi-fold improvement in DNN 
verifier performance. 


Keywords: neural network verification - neuron stability - pruning 


1 Introduction 


In recent years, there has been significant research on adapting formal verification 
to target deep neural network(DNN) model behavior. Approaches have been 
developed that incorporate a diverse range of algorithmic approaches including 
reachability 42|45| b1]b2], optimization (5)f2.2)/1.5}/30]/31][34]/44][50), and 
search 9121] zaratu . These techniques aim to verify the validity of a 
network’s behavior for a wide range of inputs, e.g., perturbations of test samples 
that capture models of noise or malicious manipulation. 

DNN verification is challenging due to the high input dimension of mod- 
els, the ever-growing complexity of network layers, the inherent non-linearity of 
learned function approximations, and the algorithmically complex methods re- 
quired to formulate the verification problem [5]. Several approaches 
have been proposed to address the scalability issue, but as the results of recent 
DNN verifier competitions show scalability remains a challenge [2122182]. 

Stable neurons exhibit linear behavior and thereby have the potential to 
reduce DNN verification costs. Several researchers have explored how DNNs 


© The Author(s) 2024 
B. Finkbeiner and L. Kovacs (Eds.): TACAS 2024, LNCS 14572, pp. 24-44, 2024. 
https://doi.org/10.1007/978-3-031-57256-2_2 


Training for Verification via Neuron Stabilizers 25 


can be defined to increase the number of stable neurons and thereby facilitate 
verification. For example, one can incorporate a loss term that uses an estimate 
of neuron stability to train a network that can be verified more efficiently 63]. 
Another training time approach identifies neurons that are likely to be stable and 
active and replaces them with linear functions [10], while this approach requires 
customization of the verifier to show performance improvement. 


Whereas prior work studied individual methods for increasing neuron stabil- 
ity in combination with individual verifiers, in this paper we conduct a broad 
exploratory study considering 18 different stabilizers paired with 3 state-of-the- 
art verifiers across DNNs for different datasets and comprising different architec- 
tures. We use three algorithmic approaches to increase stability: RS Loss 
incorporates a stability-oriented loss term, BIAS SHAPING is a novel training 
time method that only modifies bias parameters to increase stability, and STA- 
BLE PRUNING is a novel approach that adapts structural DNN pruning to 
increase stability. These are paired with stability estimation algorithms that op- 
erate at training time to guide them towards increasing stability. We develop 4 es- 


timators based on prior work: NIP [53], SIP , ALR [56], and ALRo [57], 


and 2 novel estimators SDD and SAD. 


Neuron instability can be a source of verification complexity for the two 
primary algorithmic approaches to DNN verification: abstraction-based meth- 
ods and constraint-based methods. Abstraction-based verifiers 
overapproximate neuron behavior, but when the approximation is too coarse — 
due to unstable neurons — the approximations must be refined which can slow 
down verification. Constraint-based verifiers are challenged by the 
disjunctive nature of constraints that encode unstable neurons. Orthogonal to 
these approaches, branch and bound techniques (9][17||48} are also sensitive to 
neuron stability since they need to generate sub-problems for each of the active 
phases of unstable neurons. In our exploratory study, we evaluate the perfor- 
mance of verifiers that span several of these algorithmic approaches and that 
also constitute the state-of-the-art based on their performance in the most re- 
cent VNN-COMP [22]. 'This allows us to assess the extent to which increasing 
neuron stability can improve the state-of-the-art. 


In 8 [5] we report the findings of a study spanning 18 stable training algo- 
rithms, 3 state-of-the-art verifiers, 3 network architectures, and a large number 
of challenging property specifications. Our primary finding is that stable train- 
ing can significantly increase the number of verifications problem solved — by as 
much as 5-fold — and significantly speed up verification — by as much as a factor 
of 14 — without compromising test accuracy or training time. Moreover, we find 
that if one is willing to tolerate a modest loss in test accuracy, then even greater 
improvement in verifier performance can be achieved. 


'The contributions of the work lie in a comprehensive evaluation of the poten- 
tial for optimizing DNN verifier performance by increasing the number of stable 
neurons. More specifically, (1) we adapt RS Loss with different stability esti- 
mators and evaluate its performance across multiple verifiers and benchmarks; 
(2) we propose two novel approaches (BIAS SHAPING and STABLE PRUNING) to 
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Fig. 1. Illustration of the applying STABLE PRUNING to verifying that a small original 
network outputs a pair of values where the first is negative and the second positive for 
inputs æ € [0.3, 0.9] x [0.1, 0.7]. Unstable neurons are shown in red and pruned neurons 
and their edges are dashed. 


increase neuron stability and evaluate their performance across multiple verifiers 
and benchmarks; (3) we integrate these state-of-the-art neuron stabilizers into an 
open-source framework that supports experimentation with stability optimiza- 
tion by the DNN verification research community; and (4) show empirically that 
the performance of state-of-the-art verifiers can be significantly enhanced using 
stable training methods. These contributions set the stage for further work on 
training for verification that aim to further characterize the best stable training 
strategy for a given verifier and verification problem. 


2 Overview 


The popularity of the rectified linear unit (ReLU) activation function, z = 
max(2,0), which allows for more efficient training and inference [20] 29]. has 
led verification researchers to target networks using them. In this section, we 
illustrate how ReLU leads to exponential verification costs and how training can 
mitigate that cost. 

For a DNN with ReLU activation functions, M : R” — R™, comprised of 
k neurons, an inference, A (x), results in each neuron being either active, when 
z — max(2,0) — £, or inactive, when z — max(2,0) — 0. The status of each 
neuron in a network during inference defines an activation pattern, ap(a) — a 
Boolean vector of length k. Verifying a set of inputs, $4 C R”, involves symbol- 
ically reasoning about the set of activation patterns, and the associated neuron 
outputs, for each « € dz. In the worst case, there are 2^ possible activation 
patterns which lead to the exponential complexity of ReLU verification [23]. 

For a given set of inputs, $4, a neuron, n;, is stable and active if Va € dg : 
ap(a)|i], and stable and inactive if Vx € $4 : ^ap(x)[i]. A neuron’s stability is 
dependent on the computation performed by its cone of influence |6| taking into 
account both $4 and the behavior of neurons on which n; depends. In Fig. 
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consider verification of a local robustness property centered at x = (0.6, 0.4) 
with a radius of e = 0.3 — so ¢y = [0.3, 0.9] x [0.1, 0.7]. For such inputs, a single 
neuron, n2, is stable — its pre-activation values are all positive, 22 = [2.1, 5.1]. 

In we define a set of techniques that aim to estimate which neurons are 
unstable during training and then bias the training process to stabilize them. 
Fig. shows the application of one pair of those techniques to the original 
network and property. More specifically, the NIP estimator propagates interval 
approximations of neuron pre-activation values to estimate whether they are 
stable and then the STABLE PRUNING technique removes neurons that are stable 
and inactive. During training this method estimates the pre-activation value for 
ni to be % = [—0.2, 2.2] which is nearly stable. STABLE PRUNING ranks neurons 
based on the distance they need to be shifted to be stable; for 2, that distance 
is 0.2. We adapt the iterative pruning approach of DropNet to use this 
ranking. The intuition is that when a neuron is nearly stable it can be removed 
and in subsequent training, the parameters of the remaining neurons will adapt 
to compensate and preserve accuracy [18]. As illustrated in Fig. the number 
of unstable neurons is halved which can reduce verification costs. 


3 Background & Related Work 


Deep Neural Networks (DNN) are trained to accurately approximate a 
target function, f : R” — R™. A network, M : R” > R”, is comprised of a 
sequence of L hidden layers, l;,...,lr, along with an input layer, lin = lo, and 
output layer, lout = lr 1(e.g. (a) in Fig. 1) Hidden layers are comprised of a set 
of neurons that accumulate a weighted sum of their inputs from the prior layer 
and then apply an activation function to determine how to non-linearly scale 
that sum to compute the output from the layer. Different activation functions 
have been explored in the literature, including: Rectified Linear Units (ReLU), 
Sigmoid, and Tanh. 

Given a neural network architecture, N (-), the network is trained to define 
weight values, denoted 0, and bias values, denoted b, that are associated with 
each neuron's input. A trained network defines for input æ, the output N (x; 6, b); 
when it is clear from the context we drop 0,6 and write N(x). 


Specifying DNN Properties Given a network V : R” — R”, a property, 
$, defines a set of constraints over the inputs, $4, and an associated set of 
constraints over the outputs, ¢,. Verification of M | 6 seeks to prove: Va € R” : 
balz) > by(N(2)). 

Recent work has demonstrated that a general class of specifications, where 
Êx and dy are defined as half-space polytopes, can be reduced to local robustness 
specifications [25][36]. This means that the essential complexity of DNN verifi- 
cation is present when verifying simpler local robustness specifications, which 
state that Va € cc e : dy(N(a)), for some constant input(centerpoint), c, and 
radius, e, around it. Consequently, in we explore the performance of verifiers 
on local robustness specifications. 
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Verifying DNN Properties The inherent complexity of the DNN verification 
problem arises from the non-linear expressive power of DNNs ~ so it is generally 
unavoidable. We explain the source of this complexity below for a network with 
L fully-connected layers, each with M neurons. 

Let %,; denote the value computed for the input of neuron j in hidden layer i 
prior to the application of the activation function — the pre-activation value — and 
zi j the post-activation value. For a ReLU activation function, z; ; = max(2; j, 0). 
The input to layer 7 is computed as the weighted sum of the output of the prior 
layer, using the learned weights 0, and bias b. The semantics of A (250,6) is 
given by the constraints as shown in [Eq. (1)] 


A (s = M Giga aay) thy Nag = ss) (1) 


ic[1,L], je (1, M] ke[1,M] 


with additional constraints relating the zz; to the output layer, lout, and x = 2%. 
Computing M(x) for a single input value, æ, results in a pattern of ReLU 
activations in which each neuron is either active, max(2;,;, 0) = 2;,j, or inactive, 
max(2; j, 0) = 0. However, a property specification, ¢, constrains lin to define a 
set of input values, e.g., as in the case of local robustness æ € c + e. Through 
Eq. (1)| this may give rise to constraints on 2;; that define values for which 
the neuron is both active, 2;; > 0, and inactive, 2;; < 0. When the set of 
pre-activation values spans 0 in this way, we say that neuron n; j is unstable. 

Unstable neurons require that verification approaches reason about the dis- 
junctions present in In the worst case, if all neurons are unstable, then 
there are 2'*M different ways of resolving the disjunctions. More generally, for a 
property, ¢, only a subset of neurons will be unstable, Uy C L x M, and, as we 
discuss in controlling the size of this subset is a means of reducing the cost 
of DNN verification. 

Several approaches have been introduced to verify a DNN behavior in recent 
years Pa class of verifiers, including a,8-CROWN [48], NNENUM [3]. 
ERAN [40], and MN-BAB overapproximate ReLU behavior which allows 
them to efficiently calculate an overapproximation of[Ea. (1)] which we denote NV. 
When N |£ ¢ some techniques, like ERAN, simply return unknown, but others, 
like NNENUM, a,8-CROWN or MN-BAB, perform a case split on unstable 
neurons to refine the over-approximation. Another class of verifiers, including 
MARABOU and PLANET [13]. explore the space of case-splits to formulate 
separate constraint queries that constitute verification conditions. Here again, 
the number of possible case-splits leads to exponential complexity. 


RS Loss is a regularization technique that induces neuron stability in the 
training process. The RS Loss, Lg is blended with the regular training loss Lr 
to yield a weighted sum as the optimization target, L = Lr + wg x Lg, where 
wg is the hyperparameter to control the degree of stabilization. The RS Loss 
term Lg is formulated as Lg = $5;., —TANH(1 + 3; x 2;) where 2 and 2 are the 
lower and upper bounds of the pre-activation values. NRS Loss is a variant 
of RS Loss that regularizes the pre-batch normalization (BN) bounds instead of 
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pre-activation bounds. Whereas RS Loss indirectly biases the network toward 
neuron stability, in §4] we introduce BIAS SHAPING which directly manipulates 
neuron bias towards the same goal. 


DropNet is a structured model compression method to generate sparse and 
reduced neural networks based on the lottery ticket hypothesis [18]. According to 
the hypothesis, a dense network contains a sub-network that can match the test 
accuracy of the base network if trained in isolation. DropNet iteratively prunes 
a predefined percentage of less important neurons by setting their weights to 
zero. Although the iteration process is resource expensive, the flatness of the 
error landscape at the end of training limits the fraction of weights that can be 
pruned, hence sharp pruning at once reduces the network accuracy [3]. 

While the initial purpose of pruning was preserving network accuracy only, 
recent studies have revealed that pruning can significantly increase a network's 
robustness and scale robustness verification [59]. 'The removal of non-linearity 
from the insignificant neurons by converting them to linear functions has been 
proposed in literature [1o]. However, the existence of linear activation functions 
in a network can sometimes result in unnecessary computational costs, as the 
networks are supposed to work on complex data and linear functions are inca- 
pable of handling the complexity. Also, special treatments are required to handle 
these non-standard architectures in network inference and verification. Thus, we 
propose to use iterative pruning to remove the redundant non-linearity from the 
network using the pre-activation values of the ReLU function during mini-batch 
training. In we present a variant of DropNet named STABLE PRUNING that 
uses stability measures to determine how neurons should be pruned. 


4 Approach 


This section presents the 
two novel neuron stabilization 
methods: BIAS SHAPING and 
STABLE PRUNING, as well as 
six different stability estima- 
tors. [Alg. 1|shows the general 
training iterations for a neural 
network with stabilizers(pairs 
of stabilization method, A, 
and stability estimator, B). 
The conventional neural net- 
work training process of a 6 return V" 

mini-batch is shown in Line] — — — — HUTT 
Stabilizers are applied at every sth mini-batch (line [3). Line [4] determines each 
neuron’s stability estimation by calculating their boundaries, Zz, using differ- 
ent estimators described in 84.1| Lastly, Line |5| applies the main stabilization 
algorithms, e.g. BIAS SHAPING (Alg. 2) and STABLE Pruninc (Alg. 3]. 


Alg. 1: Training with Stabilizers 
input : neural network M, data loader D, 
stabilization method .A, stability 
estimator B, ratio i, and step s 
output : stabilized network N” 
1 for j, (X, Y) in D do 
TRAIN MiNr-BaTCH(N, X, Y) 
if j 2 0 (mod s) then 
Z + ESTIMATE.STABILITY(B, NV) 
N’ + STABLIZE(A, N, Z, i) 


a A ON 
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4.1 Neuron Stability Estimation 


The neural network training process is performed on the data samples, while the 
verification process seeks to prove certain properties on an effectively unbounded 
set of inputs. Hence, there exists a gap between the two stages since a neuron that 
is stable on the training dataset is not guaranteed to also be stable based on the 
set of values described by the precondition of the verification problem. Guiding 
the training process to produce neural networks with more stable neurons in 
the verification stage requires reducing this gap. This is achieved by estimating 
neuron stability over a broader set of values representative of those encountered 
during the verification process and then stabilizing the unstable neurons. 

We identify two general categories of neuron stability estimators that can 
be calculated during the training phase: Sampled[S] and Reachability[R] 
estimators. The sampled estimators consider a finite set of sampled data gathered 
directly or inferred from the training dataset. The reachability estimators operate 
on set propagations that generalize the training dataset. The six neuron stability 
estimators are defined as follows: 


B(D) = (x|v = B(x’) ^ a' ~ D) 


where 6 € (SDD, SAD, NIP, SIP, ALR, ALRo} and D is the network train- 
ing dataset distribution. The SDD (Sampled Dataset Distribution[S]) estimator 
uses the training mini-batch samples directly and takes advantage of the train- 
ing process's forward propagations to determine whether neurons are stable. The 
SAD (Sampled Adjacent Distribution[S]) estimator samples from the robust- 
ness radii of the training mini-batch and runs extra forward propagations on 
the adjacent examples to determine the stability of neurons. The NIP (Naive 
Interval Propagation[R]) estimator generates a set of intervals based on the 
mini-batch samples and the given robustness radii. However, instead of prop- 
agating exact samples, it propagates the intervals through the network. The 
SIP (Symbolic Interval Propagation[R]) extends NIP by using sym- 
bolic intervals instead of concrete intervals when propagating through the net- 
work. The symbolic intervals are concretized whenever neuron stability needs 
to be evaluated. The ALR and ALRo (Auto. LiRPA[R]) estimators 
further improve SIP by applying more precise but computationally expensive 
over-approximation constraints and parameterizing upper and lower bounds of 
hidden neurons to optimize objectives with respect to the property of interest. 
ALRo applies the o optimization when compared to the base approach. 
Note that although many of these approaches were developed for other uses, the 
integration of them to induce stable neurons during training is novel. 


4.2 Bias Shaping 


To increase the number of stable neurons in the neural network, we adapt 
training to ensure the same polarity of lower and upper bounds of neuron pre- 
activation values. In [Eq. (1)] the pre-activations of the current ReLU function 
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are controlled by the parameters of the neural network and the post-activations 
of the previous layer. The weighted-sum term depends on the weights, bias, and 
the post-activations of the previous layer. The pre-activation values can be easily 
manipulated by changing the bias term. We refer to this as BIAS SHAPING, as 


described in 


Alg. 2: BIAS SHAPING 


input 


Instead of using just the na- 
tive pre-activation of the mini- 
batch samples, the stability esti- 
mators are applied to further close 
the gap between neuron stability 
during training and verification. 


takes the set of stability 


: neural network M, stability 
estimation boundaries Z ; 
ratio à 

output : stabilized network N’ 


Ê, Ze Get_Bounps(Z) 


Nu € (ni in N where 2; < 0^2; > 0} 
Zu € {MIN(— 2n: ên; ) where n; € Nu} 
y — Sonr(Z,)[|Z] x i] 


estimations for all neurons, Z — 
[41, 22, ..., £4], the neural network 
N with m neurons (n4, n», ..., Nm), 


and the ratio i as inputs. Line 
calculated the lower and upper 
bounds of the estimation Z. Using 
those bounds, the algorithm first 
finds the unstable neurons of the 
input network (line B). Next, those 
neurons are ranked based on their 
distance to zero(lines [5] - [9), and 
the smallest subset of neurons will 
be selected for shaping if their distances are less than an adaptive threshold y 
(lines 4). Note that the number of selections is controlled by a parameter i 
— a percentage of neurons would be shaped at a time. Each neuron's bias term 
of the subset is modified by (a) shifting left by the value of the upper bound if 
the upper bound is closer to zero (line 7p: or (b) shifting right by the absolute 
value of lower bound if the lower bound is closer to zero (line|9). As a result, the 
stabilized network is created by loading the new parameters at line 


for n; in Nu do 
if (2; < y) ^ (ĉi < —£;) then 


lse if EA « y then 


o œ No ap ONB 
[o] 


10 N” — LOAD.PARAMETERS(N,) 
11 return A^ 


4.3 Stable Pruning 


Inspired by the DropNet approach, we developed a new pruning method 
to reduce unstable neurons, named STABLE PRUNING as shown in It 
uses iterative structured pruning to modify the global weight matrix by selec- 
tively masking neurons. Its novel criteria target specifically unstable neurons for 
masking. STABLE PRUNING sets weight and bias to zero to softly “remove” the 
neuron from the network, allowing back-propagation to recover accuracy loss by 
the harsh parameter modifications. 

Given the stability estimation 2 for a neuron, 2 and 2 denote the lower and 
upper bounds respectively. When lower bound 2 is greater than 0, although 
the neuron is stable-active, it cannot be pruned without changing the network’s 
behavior, as the ReLU function is treated as an identity function. When 2 is 
less than 0, the ReLU function is treated as a zero-function, and this neuron 
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can be removed safely (line[3). In order to prune unstable neurons with minimal 
effects on network behavior, STABLE PRUNING ranks the unstable neurons by 
the distance between 2 and 0, from smallest to largest (line la), and a subset of 
neurons (also controlled by the ratio parameter, i) will be selected for pruning if 
their distances are less than an adaptive threshold y (linef). Initially, all neurons 
are enabled in the mask, m, (line [1p and those that fall below the threshold are 
updated to be removed from the network (line|6). Finally, the stabilized network 
is generated by applying the pruning mask on the network (line (7h. 


4.4 Implementation 


We implemented all of the above 
—— LLL techniques, including: SDD, SAD, 
input : neural network NV, stability NIP. SIP. ALR. ALRo. RS Loss 


estimation boundaries Z, (3) Bias SHAPING d4.2] aid 
? M E 


Alg. 3: STABLE PRUNING 


ratio 4 d -= 
output : stabilized network N” STABLE PRUNING (84.3), into the 
OCTOPUS framework. OCTO- 


am d p PUS allows training neural net- 
2 Z c GET-UPPER-BOUND(Z) works with stabilizer methods and 
3 m[Z «0]—0 stability estimators, including their 
a X = sort(Z » 0) free combinations. It can be eas- 
5 y= ZZ xil ily applied to different datasets and 
6 m| «4] — 0 network architectures and presents 
7: N' -N Om a rich hyper-parameter space that 
8 return A" can be tuned by hand or algorithmi- 


OO cally, e.g., by search methods. RS 
Loss is reimplemented to support all the additional neuron stability esti- 
mators. The SIP estimator uses the Symbolic Interval Analysis Library devel- 
oped in |46|, and the ALR and ALRo estimators integrate the Auto. LiRPA 
Library . OCTOPUS also allows combinations of various neuron stabilizers 
and estimators, i.e., training with multiple stabilizers sequentially or simultane- 
ously. The framework is built for ease of extension to adopt new techniques and 
is available at both FigShare and GitHub 


5 Evaluation 


We explore two research questions to understand how stabilizers can be benefi- 
cial for DNN verification: 

RQI1. How effective are the stabilizers in increasing the proportion of stable 
neurons? 

RQ2. How effective are stabilizers in enhancing DNN verification performance? 


3 OCTOPUS GitHub link: https: //github.com/edwardxuO/octopus 
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Tab. 1. Experimental parameter space 


Parameters Choices 
Architectures M2: MNIST_FC2(FC(256) x2), M6: MNIST_FC6(FC(256) x6)) 
C3: CIFAR2020(Conv(32,5,2), Conv(128,4,2), FC(250)) 
Verifiers a,B-CROWN, MN-BAB, NNENUM 
Properties [0,1,...,9] 


M2, M6:[12e-3, 14e-3, 16e-3, 18e-3, 20e-3] 


Epsilon: Rade C3:[18e-4, 20e-4,22e-4, 24e-4, 26e-4] 


Stabilization Methods BASELINE, BIAS SHAPING, RS Loss, STABLE PRUNING 
Stability Estimators SDD, SAD, NIP, SIP, ALR, ALRo 
Seeds [0,1,2,3,4] 


5.1 Study Design 


To answer these questions, we design a broad study considering different neural 
network architectures, specifications, and verifiers. [Tab. 1]shows the full experi- 
mental parameter space we consider across the research questions. 

The annual VNN-COMP DNN verification competition provides a 
range of benchmarks with standard network and property formats to evaluate 
state-of-the-art verifiers. These benchmarks cover a variety of network archi- 
tectures and activation functions. This architectural variety evaluates verifiers’ 
applicability across a range of network graph operations, e.g. ResNets with skip 
connections, max-pooling layers, non-linear activation, and domain-specific net- 
works. Benchmarks also vary in scale with some having large numbers of layers, 
neurons, and parameters under the assumption that this will yield challenging 
benchmarks. 

We conducted an exploratory study of the VNN-COMP 2022 benchmarks 
and found that 1156 of 1288 (89%) could be solved within 30 seconds. Nearly all 
of the solved problems were proven (UNSAT) with coarse over-approximation 
or falsified (SAT) with adversarial attacks. Such benchmarks do not exhibit the 
exponential complexity that is inherent in DNN verification [23]. To address this 
limitation, we designed a set of benchmarks that are better suited to assessing 
DNN verification algorithm performance. 

Selecting Networks A retrospective analysis of VNN-COMP benchmarks 
determined that small weakly-regularized networks exhibit exponential complex- 
ity and medium-sized with large numbers of neurons are hard to scale for precise 
methods, such as branch and bound [8]. Of course, large weekly-regularized net- 
works with large numbers of neurons are even harder, but it was found that 
these incur significant memory requirements which makes experimentation chal- 
lenging, e.g., due to hardware limitations. Based on this analysis, we focus on 
three small and medium-sized networks with traditional network architectures 
selected from the VNN-COMP 2022 benchmarks, since these proved capable of 
forcing verifier algorithms to cope with exponential complexity. 

Selecting Properties Rather than focusing on a variety of structurally dis- 
tinct property specifications, we exploit the fact that general reachability proper- 
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Fig. 2. Solved problems and verification time vs. epsilon radii 


ties can be reduced to local robustness properties [7]. 'This allows us to vary the 
verification problem difficulty by controlling the robustness property's epsilon- 
radius. Conceptually, we know that verification problems with sufficiently small 
(large) radii will be verified (falsified) — a radius of 0 is trivially verified and 
a radius comprising the full input domain requires that a network produce a 
constant output. Verifier developers have incorporated techniques, like apply- 
ing adversarial attacks and using coarse overapproximations, to quickly handle 
such cases [3]]48]. To sidestep these verification fast paths and exercise the core 
verification algorithms in our study, we select epsilon values for properties as 
follows. 

For each network, we conducted a preliminary study with varying radii to 
assess the difficulty of the verification problems. Fig. B]shows the results for M2 
on 50 different center-points with the three verifiers. The dashed lines show the 
number of verified problems and the dotted lines the number of falsified problems 
(left y-axis). We observe the trend that small epsilon leads to uniformly verified 
problems and large epsilon to uniformly falsified problems. Moreover, one can 
observe low verification times (right y-axis) in these extreme epsilon regimes, 
due to the fast path optimizations. 

Our strategy for selecting harder verification properties is to choose a sample 
of radii around the point where the number of verified and falsified problems 
crossover, e.g., 0.018 in this plot for MN-BAB. We choose the crossover point 
of the best verifier who solved the most problems to design the radii shown 
in This leads to a balance in verification ground truth between SAT 
and UNSAT answers, and these more challenging problems force the underlying 
algorithms to more precisely model network behavior, e.g., splitting of unstable 
neurons into branch and bound cases. 

Selecting Verifiers Unlike other research that focuses on improving the per- 
formance of a single verifier with a single customized pruning techniques 
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[59], our goal is to explore how the space of stabilization strategies impact a range 
of verification approaches. Towards this goal, we select the three best-performing 
verifiers from VNN-COMP 2022 that were available: a,b6-CROWN, MN- 
BAB, and NNENuM[I Improving the performance of these verifiers will extend 
the state-of-the-art in scalable DNN verification. 

Network Training Stabilizers are incorporated into training, so we use a 
baseline(BASELINE) trained without any stabilizers using the Adam optimizer 
with a 10^? learning rate and 0.99 decay for 20 epochs. All stabilizers are cus- 
tomizable with hyperparameters, as described in and We use the well- 
tuned parameter for RS Loss introduced in [53], and perform a binary search 
of the parameter space for BIAS SHAPING and STABLE PRUNING. To elaborate, 
RS Loss uses always-active scheduling with 1074 weight parameter; BIAS SHAP- 
ING uses interval scheduling activated every 5/25/50 mini-batches and adjusts 
296/596/596 of unstable neurons each time it is applied for M2/M6/C2 archi- 
tectures respectively; STABLE PRUNING undertakes an interval scheduling that 
is activated for every 5/50/50 mini-batches with a pruning ratio of 2%/5%/5% 
respectively. The resulting neural networks with the largest test accuracy of 
the last five epochs are selected for verification. To account for stochasticity in 
training, we train each network 5 times and report the mean data for each. 

'These choices for the space of experiments yield a total of 1,215 training tasks 
and 36,450 verification tasks. Each training task is run with one GT'X 1080 Ti 
GPU with 11G VRAM. Each verification task is run with 8GB of memory on one 
core of the Intel Xeon Gold 6130 CPU @ 2.10GHz with a timeout of 300 seconds. 
The total CPU time spent on training and verification across our experiments 
is 1858 and 1052 hours, respectively. 


5.2 RQI1: Stabilizing Neurons 


Stabilizers aim to linearize a portion of the behavior encoded by ReLU activation 
across the set of computations activated for a property precondition. In this 
experiment, we directly measure this by recording the percentage of neurons that 
are stable during verification. We also record model test accuracy to understand 
the trade-offs of the stabilization methods and stability estimators. Existing 
verifiers do not record the number of stable neurons, so we modified an open- 
source DNN verifier, NEURALSAT [11], to record the number of stable neurons 
computed during verification. 

Fig. 3| presents the average test accuracy and the average number of stable 
neurons computed across the five training seeds for the three architectures across 
the stabilizers in the benchmark as described in The black + sign indicates 
the BASELINE (Baseline), the 6 sign represents RS Loss (RS), % means the 
BIAS SHAPING (BS) method, and W is STABLE PRUNING (SP). Six different col- 
ors denote the different stability estimators. Across all three architectures, most 
techniques can increase the number of stable neurons, but some of the techniques 


^ Verinet performed well in the competition, but it required a custom solver that is 
not freely available. 
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Fig. 3. Stable neurons(%) vs. test accuracy(%) per model 


lead to a loss in test accuracy. For the M2 architecture, RS Loss with NIP can 
significantly increase the number of stable neurons by more than 26 percent- 
age points without compromising accuracy. For M6, RS Loss yields an even 
greater increase of 55 percentage points but in combination with the SIP esti- 
mator. For the Convolutional C3 network, a very high percentage of neurons are 
already stable so only marginal improvement can be achieved. Here the STABLE 
PRUNING method performs best while preserving accuracy, but it only yields a 
percentage point increase. For all of the architectures, if one is willing to sacrifice 
a degree of accuracy then further increases in stability can be achieved. For ex- 
ample, for M2 bias shaping can achieve an additional 7 percentage point increase 
in stable neurons at the cost of just over 1 percentage point in test accuracy. 
head even prevents RS Loss from practically 

training with ALR and ALRo on the C3 ar- 7 * 

chitecture. The overhead of most of the other Lil lil 

estimators is negligible, including those that i RS BS 

yielded significant increases in stable neurons. Approach 


Incorporating stabilization in training can 
increase training time. [Fig. 4|shows the aver- > 
age training time for M2 normalized to the 
BASELINE. The trend for M6 is similar to 
the other architectures. The clear outlier in 
terms of cost is the ALRo estimator when 
used with RS Loss, which incurs more than 
a 5-fold increase in training time. This over- 
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Fig. 4. Normalized training time 
RQ1 Findings Across the study there are 
combinations of stabilization methods and stability estimators that are capable 
of increasing the number of stable neurons, in many cases substantially, without 
compromising test accuracy or training time. 
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Fig. 5. Solved verification problems vs. test accuracy(%) 


5.3 RQ2: Enhancing Verification 


RQ1 demonstrates the ability of stabilizers to increase the number of stable 
neurons across a space of verification problems. This question explores whether 
those increases lead to improvements in verifier performance. To assess the gen- 
eralization of the stabilizers to variations of DNN properties, we verify 50 local 
robustness properties per trained network, pairing 10 center points with each of 
the 5 epsilon radii. We run the three selected state-of-the-art verifiers on each 
problem. 

We measure two metrics to assess verification performance: (1) the number 
of problems, i.e., the network, center-point, and radii combination, each verifier 
can solve, i.e., produce either an SAT or UNSAT result, and (2) the time taken 
to solve those problems. Note that our metrics exclude runs that produce errors, 
exceed a 300-second timeout, or an 8GB memory bound. These metrics are stan- 
dard for assessing verifier performance and while sometimes they are aggregated, 
as in PAR2 [55], we keep them separate here to explore them independently. 

[Fig. 5|shows six plots of the number of verification problems solved versus test 
accuracy across the three architectures using three of the verifiers. The trends in 
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Fig. 6. Verification time speedup vs. test accuracy (96) 


these plots are largely consistent with the findings of RQ1 - when more neurons 
are stable the verifiers are more effective in solving problems. RS Loss, with 
different estimators, increases the number of problems solved by factors up to 
5.92 for these verifier network combinations without sacrificing test accuracy. As 
in RQI, further performance improvements are possible by sacrificing accuracy. 
For example, on M2 o,6- CROWN can improve by a factor of 1.67 using BIAS 
SHAPING with a reduction of 1 percentage point in accuracy. 

The trends shown here are consistent with the performance of a,3-CROWN 
and NNENUM across the study, but MN-BAB exhibited different performance. 
For M2 and M6, the baseline technique was able to solve all 50 problems so 
there is no opportunity for improvement, while almost all the stabilizers can 
maintain the 50 problems solved. Note that the implementation of MN-BAB 
just doesn't support the C3 architecture. While the number of problems does 
not change for MN-BAB with stabilization as we discuss below its runtime is 
reduced. 

[Fig. 6|plots the verification time speedup over BASELINE against test accuracy 
for 6 verifier network pairs. We observe a similar trend to what was observed for 
the number of neurons stabilized and the number of verification problems solved 
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— stabilization can speed up verification without compromising test accuracy. For 
MN-BAB on M2 while the number of problems solved did not change, using 
RS Loss with NIP yielded a factor of 14 speedup. For M6 we see a speedup 
of up to a factor of 5 with NNENUM and for C3 more modest speedups for 
a,B-CROWN. The MN-BAB plot also shows, as observed above, that further 
speedups — greater than 30 fold — can be achieved if one compromises accuracy 
by about 1 percentage point. 


RQ2 Findings Stabilizing neurons during training can substantially increase 
the number of problems solved and reduce the time required to solve them 
by state-of-the-art DNN verifiers without compromising test accuracy. Further 
improvement in verifier performance can be achieved with a small sacrifice in 
test accuracy. 


5.4 Discussion 


The data show a significant degree of variability in the effectiveness of particu- 
lar stable training approaches with verifiers and verification problems. Broadly 
speaking RS Loss seems to perform well when one is unwilling to sacrifice test 
accuracy, but the best estimator varies depending on the verifier and problem — 
with SDD, NIP, and SIP yielding the best performance. For the large Convolu- 
tional network, STABLE PRUNING also performs well without compromising test 
accuracy. We believe this to be consistent with the broader results from the field 
of structured pruning islas], where it has been found that large networks tend 
to be over-parameterized and can thus accommodate significant pruning without 
compromising accuracy. While the study shows that many of the methods can 
yield benefits, we believe that it also demonstrates that certain stabilization ap- 
proaches, e.g., RS Loss with ALRo, are too costly for use in practice. Further 
study should focus on how to select the best stable training approach, and its 
hyperparameters, to yield the best improvement for a given verifier and class 
of verification problems. We believe it will be fruitful to develop such training 
for verification approaches in concert with algorithmic and engineering improve- 
ments to verification algorithms. 


5.5 Threats to Validity 


The chief threats to internal validity relate to whether the collection of test 
accuracy, stable neurons, verification problems solved, and verification time were 
accurate. We tested the accuracy of all stabilizer-trained networks, cross-checked 
problem solutions across verifiers, and thoroughly tested our instrumentation of 
NEURALSAT for recording neuron stability. Regarding external validity, while 
our study was scoped to manage experimental costs, it spanned: 3 verifiers, 3 
network architectures, 50 property specifications, and 5 seeds. We used fixed 
sets of training and stabilizer parameters per neural network architecture, which 
potentially underestimated the benefit that might be observed by customizing 
parameters. While broadening the study further would be a valuable direction 
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for future work, the scope of the study is sufficient to support the finding that 
stabilizers can enhance DNN verification across a breadth of contexts. 


6 Conclusion 


Verifying neural networks is a challenging task due to their high computational 
complexity. In this work, we propose two novel approaches BIAS SHAPING and 
STABLE PRUNING, to enhance the scalability of DNN verifiers by inducing more 
stable neurons during the training process. In addition, we designed six neuron 
stability estimators to drive stability-oriented training. Across a significant study, 
we found that focusing on stability yields a viable method to achieve training 
for verification that can significantly improve the ability to solve problems and 
speed up state-of-the-art verifiers. 

Besides the promising results, we identified more opportunities when working 
on this project. In the future, we plan to (1) extend our methods to real-world 
large neural network architectures; (2) explore automatic ways to tune hyper- 
parameters that lead to better performance; (3) further enhance the stabilizers’ 
performance while minimizing accuracy trade-offs; (4) study the applicability 
of stabilizer combinations; and lastly (5) study the verification algorithms to 
understand how to customize stabilizers to benefit the most. 
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Abstract. We introduce NeuroSynt, a neuro-symbolic portfolio solver 
framework for reactive synthesis. At the core of the solver lies a seamless 
integration of neural and symbolic approaches to solving the reactive syn- 
thesis problem. To ensure soundness, the neural engine is coupled with 
model checkers verifying the predictions of the underlying neural models. 
The open-source implementation of NeuroSynt provides an integration 
framework for reactive synthesis in which new neural and state-of-the-art 
symbolic approaches can be seamlessly integrated. Extensive experiments 
demonstrate its efficacy in handling challenging specifications, enhancing 
the state-of-the-art reactive synthesis solvers, with NeuroSynt contribut- 
ing novel solves in the current SYNTCOMP benchmarks. 


1 Introduction 


The reactive synthesis problem [16] seeks to automatically construct an imple- 
mentation from a system’s specification. Rather than delving into the intricate 
nuances of how a system computes, hardware designers can describe what the 
system should achieve and leave implementation details to the synthesis engine. 
We introduce NeuroSynt, a portfolio solver for reactive synthesis that combines 
the efficiency and scalability of neural approaches with the soundness and com- 
pleteness of symbolic solvers. 

The reactive synthesis problem has seen significant progress in recent years 
[12,29,30,50] with active tooling development [1,11,25,27,46,39], and an annual 
competition (SYNTCOMP [36]). However, applications beyond the competition 
to an industrial scale are still limited. The advent of machine learning, empow- 
ered by the advancements in deep learning architecture and hardware accelera- 
tors, has the potential to drastically increase performance in reactive synthesis. 
While deep learning approaches offer efficiency, they lack soundness and com- 
pleteness guarantees, which are essential to the reactive synthesis problem. 

We address this challenge by introducing NeuroSynt, a portfolio solver frame- 
work for reactive synthesis that aims to bridge the gap between soundness, com- 
pleteness, and practical efficiency through the combination of state-of-the-art 
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symbolic solver, model-checker, and deep learning techniques. The integrated 
neural solver computes candidate implementations while model-checking tools 
verify the candidate solutions to ensure soundness. To ensure completeness, the 
neural solver is backed up by several state-of-the-art symbolic solvers running in 
parallel. 

In particular, our main contribution is the design and open-source implemen- 
tation of the extensible and efficient portfolio solver. NeuroSynt’s design priori- 
tizes extensibility: Its modular architecture facilitates the seamless integration of 
new models, algorithms, or optimization techniques. This adaptability ensures 
that NeuroSynt remains relevant amidst evolving methodologies, providing re- 
searchers with a unified platform to experiment, validate, and advance their 
innovations in the reactive synthesis domain. 

Additionally, we contribute an advanced neural solver for reactive synthesis 
(based on [57]) that handles larger and more complex specifications, improving 
its performance on real-world instances from SYNTCOMP. 

Our results show that deep learning methods can indeed increase the perfor- 
mance of reactive synthesis tools. NeuroSynt provides smaller solutions faster 
while maintaining soundness and completeness. Our portfolio solver enhances the 
performance of the state-of-the-art Strix [46] by 31 samples on the SYNTCOMP 
2022 benchmark, and the bounded synthesis tool BoSy [27] by 152 samples. No- 
tably, a virtual best solver (VBS) that combines the neural solver with all tools 
in the SYNTCOMP 2022 competition solves an additional 20 instances that a 
VBS without the neural solver could not solve. 


2 Background 


Reactive Synthesis. The reactive synthesis problem is a well-known algorithmic 
challenge, that dates back to Church [16,15] as the problem of automatically 
constructing an implementation from a system's specification. With the decid- 
ability findings in 1969 [10] (using games) and 1972 [54] (using automata), a long 
history of work on reactive synthesis was initiated. After the introduction of tem- 
poral logics in 1977 [51], the complexity for LTL reactive synthesis was found 
to be 2-EXPTIME complete [52] but undecidable for distributed systems [53]. 
Since then, many different approaches have been developed (e.g., [12,29,30,50]) 
and implemented in tools (e.g. [1,11,25,27,38,39,46,55]). Moreover, an annual 
competition, the Reactive Synthesis Competition (SYNTCOMP [36]), associ- 
ated with the International Conference on Computer Aided Verification (CAV) 
is organized to track the improvement of algorithms and tooling. 


Linear-time Temporal Logic (LTL). LTL extends propositional logic by intro- 
ducing temporal operators U (until) and O (next). Several additional opera- 
tors can be derived: Oy = trueUy and Oy = ^g. Qy is interpreted 
as y will eventually hold in the future and Oy as y holds globally. Oper- 
ators can be nested, e.g. OOy states that p has to occur infinitely often. 
Linear-time Temporal Logic (LTL) [51] is the prototypical temporal logic for 
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expressing requirements of reactive systems. For example, the following formula 
describes an arbiter: Given two processes and a shared resource, the formula 
Olro > Ogo) ^U(ri > Ogi) ^U^(go A gi) describes that whenever a process 
requests (r) access to a shared resource, it will eventually be granted (g). For- 
mally, the reactive synthesis problem for LTL is defined over the notion of a 
strategy as follows: An LTL formula « over atomic propositions AP = I Ù O is 
realizable if there exists a strategy f : (2/)* — (2°) that satisfies y. We show 
the formal syntax and semantics of LTL and the definition of a strategy in the 
full version [20]. 


And-Inverter Graphs. And-Inverter Graphs are directed acyclic graphs that rep- 
resent reactive systems using three fundamental building blocks: the AND gate, 
the inverter (NOT gate), and latches, which can store a single bit for one time- 
step. The graph’s edges define the connections between gates, indicating how sig- 
nals propagate through the circuit. And-Inverter Graphs, especially the AIGER 
format [8,9], are widely used in formal verification and reactive synthesis. The 
AIGER format follows a well-defined specification. The first line contains header 
information: the maximal variable id, the number of inputs, outputs, latches, and 
AND gates in the circuit. The circuit’s components are following in this order: 
inputs, latches, outputs, AND-gates, with each component in one line. Each in- 
put, AND-gate, and latch defines an even number (variable id) to which other 
gates and outputs can refer to establish connections between gates. NOT gates 
are implicitly encoded by the odd version of each number. True and False are 
encoded by the numbers 1 and 0. 


Deep Learning in Formal Methods. Deep Learning methods have been success- 
fully applied to various domains in formal methods. Applications of deep learning 
methods in symbolic reasoning include SAT/SMT solving [4,13,58,59], tempo- 
ral logics such as generating satisfying traces [33], reactive synthesis and re- 
pair [21,42,57], as well as generating symbolic reasoning problems in temporal 
logics and symbolic mathematics [41]. Mathematical reasoning problems, includ- 
ing integration and differential equations, have been approached with transform- 
ers [43] and through code generation with Large Language Models (LLMs)[22]. 
Mathematical reasoning has also been tackled through automatic proof gener- 
ation [44]. More general applications of deep learning to theorem proving are 
guiding the proof search with clause selection for CNF formulas [45] and tactic 
and premise selection/prediction for Coq and HOL light [5,6,34,48]. In contrast 
to proof guidance, LLMs can be used for end-to-end generation and repair of 
proofs in Isabelle/HOL [31]. LLMs have recently also enabled a step towards 
autoformalization of unstructured natural language for theorem proving [37,64] 
and temporal logic [19]. Further, deep learning has had a considerable impact on 
program verification and synthesis, i.e., for termination analysis [3,32], creating 
loop invariants [49,56,61] and program synthesis /induction [2,18,26,28]. 
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Fig. 1. An overview of NeuroSynt. 


3 The Neuro-symbolic Portfolio Solver NeuroSynt 


'The portfolio solver provides a unified approach to neural and symbolic meth- 
ods for reactive synthesis. For a seamless integration of the neural method, 
NeuroSynt relies on model checking (for soundness) and is backed up by symbolic 
synthesis tools (for completeness). 


3.1 Overview 


We provide an overview of NeuroSynt in the following. Figure 1 shows the sys- 
tem design of NeuroSynt. With a single call, a sample is 1) translated from TLSF 
[35], the standardized input format for reactive synthesis, to LTL assumptions 
and guarantees. 2) Fed into the neural solver described in Section 4 with candi- 
date solutions being verified by a model-checker. This is a feasible approach since 
LTL model checking is computationally significantly easier than reactive synthe- 
sis (PSPACE [62] vs. 2-EXPTIME [52]). 3) A symbolic solver is queried simulta- 
neously with the neural solver. The final result is an implementation in the form 
of an AIGER [8] circuit, which is either a verified candidate circuit of the neural 
solver or the circuit returned by the symbolic solver. Depending on the speci- 
fication's realizability, the circuit either represents the system implementation 
(proving realizability) or the environment behavior (proving unrealizability). 
All components, neural solver, symbolic solver, and model-checker, are iso- 
lated Docker containers. All communication channels between components are 
defined through a standardized API. Therefore, extending, maintaining, and 
updating tools are uncoupled from NeuroSynt's implementation. Currently inte- 
grated are solvers based on the Python library ML2?, including the neural solver 
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described in Section 4, nuXmv [14], NuSMV [17], Spot [23], Strix [46], and BoSy 
[27]. We use SyFCo [35] to convert from TLSF to assumptions and guarantees 
in LTL. 


3.2 Usage 


Since NeuroSynt comprises multiple tools that operate in conjunction during 
each execution, users must specify arguments to tailor the behavior of these 
tools. We categorize these arguments into tool-specific and general arguments to 
simplify this process. General arguments are unrelated to any specific tool and 
are passed with the execution command in the command-line interface. 

For tool-specific arguments, we use the YAML format [7] to create configura- 
tion files that encompass the neural engine arguments, the chosen tool for model 
checking, and symbolic synthesis tasks, along with their respective arguments. 
These configuration files facilitate reproducibility and provide a structured way 
to manage tool-specific settings. 

Depending on user choice, NeuroSynt can either wait for all tools to fin- 
ish/timeout and report all results or return the fastest solution. We allow the 
standardized input format TLSF [35] and simple assume-guarantee structured 
files in LTL. 

NeuroSynt offers two primary execution commands: benchmark for solving a 
dataset of samples and synthesize for processing individual samples. For bench- 
mark, all results are saved in a CSV file, which can be further analyzed. In 
all other cases, the result is printed to the command line. First, we indicate 
whether the specification was found to be REALIZABLE or UNREALIZABLE, 
after which we print the system in AIGER format [8]. 

We refer to the full version [20] for more usage instructions and examples. 


3.3 Implementation and Extensibility 


The central design goal of NeuroSynt is to provide interfaces that are easy to 
implement when adding and integrating new components. We first describe the 
communication interfaces between components. Secondly, we detail some of the 
messages, and lastly, we describe the options to extend the portfolio solver. 

Each solver or model-checker is isolated in a Docker container and com- 
municates with other components through gRPC interfaces. gRPC is a high- 
performance open-source framework initially developed by Google for building 
remote procedure call (RPC) APIs. Protocol buffers (protobuf) are used as 
the interface definition language, ensuring programming-language-agnostic in- 
terfaces. 

In Figure 2, we show the communication through gRPC APIs for the run of 
NeuroSynt with one specification. In the first step, each tool is initialized using 
setup messages, ensuring the components’ successful connection. After setup, a 
synthesis problem call is sent to the symbolic and neural solver in parallel. Both 
solvers eventually report with a synthesis solution. Before responding, the neural 
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Fig. 2. Communication diagram of gRPC calls for a run of NeuroSynt, calling the 
Symbolic Solver and the neural solver, including model-checking. 


solver makes one or multiple calls to the model checker with candidate solutions, 
the specification, and the information on whether the specification is suspected 
to be realizable. The model checker answers a status and optionally a counterex- 
ample. The neural solver then selects one solution if multiple candidates have 
been generated and responds to NeuroSynt. The following details the specific 
protobuf messages that can be exchanged between components. 


SetupRequest and SetupResponse. As initialization, the components exchange 
simple messages through a JSON-like object. This message establishes the suc- 
cessful connection and allows the user to provide some tool- but not run-specific 
arguments. In the case of the neural solver, the model name and other param- 
eters are transmitted to load the model into the memory. The component then 
responds with a simple success flag or error message. 


SynProblem, SynSolution, and UnsoundSynSolution. The SynProblem (request) 
contains an LTL Specification and a set of JSON-like parameters to configure 
the run- and tool-specific arguments, such as timeout or the number of threads. 
The LTL specification is decomposed into guarantees and assumptions, both 
strings in infix or prefix notation. A SynSolution contains the system as the 
string representation of an AIGER circuit or mealy machine, a status (realiz- 
able, unrealizable, error, timeout, nonsuccess), the calculation duration, and the 
tool’s name. No system must be returned if error, timeout, or nonsucces were 
reported. The UnsoundSynSolution consists of a SynSolution and MCSolution 
and is returned by the neural solver. We show the protobuf definition for the 
SynSolution, SynProblem, and Specification in Figure 3. More definitions can be 
found in the full version [20]. 
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// An LTL Synthesis solution. Used as response message for Synthesis. 
message LTLSynSolution { 
// AIGER circuit. It is allowed to pass no system, e.g. if a timeout 
// happened. 
optional AigerCircuit circuit - 1; 
// Shows, whether the specification was found to be realizable or 
// unrealizable. May not be set, e.g. if a timeout happened. 
optional bool realizable - 2; 
// A status that includes useful information about the run. 
LTLSynStatus status - 3; 
// Here additional information should be supplied if the status value 
// requires more details. 
string detailed status - 4; 
// which tool has created the response. 
Tool tool = 5; 
// How long the tool took to create the result. 
optional google.protobuf.Duration time - 6; 


F 


message LTLSynProblem { 
// Defines run- and tool-specific parameters. As Map (Dict in Python). 
// Typical examples are threads, timeouts etc. Can be empty. 


map<string, string> parameters = 1; 
// A decomposed specification (assumptions + guarantees). 
DecompLTLSpecification decomp_specification = 2; 


} 


message DecompLTLSpecification { 
// All input atomic propositions that occur in guarantees or assumptions. 
repeated string inputs = 1; 
// All output atomic propositions that occur in guarantees or assumptions 
repeated string outputs = 2; 
// A set of guarantees that make up the specifications. All inputs and 
// outputs occurring in any guarantee must be part of input/output. 
repeated LTLFormula guarantees = 3; 
// A set of assumptions that make up the specifications. All inputs and 
// outputs occurring in any guarantee must be part of input/output. 
repeated LTLFormula assumptions = 4; 


Fig. 3. The protobuf definition for a SynSolution, SynProblem, and decomposed LTL 
specification. Slightly simplified for easier comprehension. We refer the reader to the 
artifact and our repository for the full definitions. 


MCProblem and MCSolution. A tool can request its candidate solutions to be 
model-checked by sending an MCProblem request. This message contains a set of 
JSON-like parameters to configure the run- and tool-specific arguments, an LTL 
specification (see SynProblem), and a system and status (see SynSolution). The 
MCSolution contains the status of the model-checking and, if violating, a coun- 
terexample in the form of an error trace and the duration of the computation. 
We show the relevant protobuf definitions in the full version [20]. 


NeuroSynt can be extended in three major ways. New neural solvers, sym- 
bolic solvers, and model-checking tools can be integrated. Although not required, 
we recommend wrapping all components into Docker containers as it helps re- 
producibility, portability, and isolation, especially when run on high-performance 
clusters. 
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Neural Solver. The neural solver sits at the core of the portfolio solver, with con- 
nections to both the model-checking component and the main portfolio solver. 
This component has to support receiving and responding to a SetupRequest and 
a SetupResponse for initialization. Furthermore, it should respond to SynProb- 
lem requests with UnsoundSynSolution. To verify candidate solutions, the neural 
solver should initiate communication with the model-checking component to ver- 
ify candidate solutions. Therefore, it should also support sending MCProblem 
requests and receiving MCSolution responses. The neural solver can be inde- 
pendent of the ML2 library if it implements the two communication interfaces 
mentioned above. It can also be based on the ML2 library, where one could 
benefit from the existing infrastructure ML2 provides. 


Model checking tools. A model checker should respond to a SetupRequest with 
a SetupResponse and receive the MCProblem request, perform model checking 
and answer with an MCSolution. 


Symbolic Solver. New symbolic solvers can be integrated into NeuroSynt by 
implementing the server side of our generic protocol buffer interface for sym- 
bolic solvers. As for all components, a symbolic solver should implement a setup 
message (SetupRequest, SetupResponse). For a synthesis call, the symbolic solver 
receives a SynProblem, performs the synthesis task, and eventually responds with 
a SynSolution. At the time of writing, we do not require the output of synthe- 
sis tools to be model-checked. However, one can implement the interface to the 
model checking component to increase the trust in the output of new symbolic 
approaches. 


4 The Neural Solver 


The neural solver is at the heart of the portfolio solver and is developed jointly 
with NeuroSynt. We report on the methodology of the neural solver, including 
architecture, datasets, data generation, training, and evaluation. We clearly dis- 
tinguish between previous work [57], introducing a neural approach for reactive 
synthesis, and improvements that are integrated into NeuroSynt, leading to the 
significantly increased performance on the SYNTCOMP benchmarks. 


4.1 Data and Data Generation Improvement 


We significantly improved the training data generation compared to previous 
work. While the basic algorithm is taken from [57], we scale the size of the 
training samples, tweak the data generation parameters to fit the larger samples, 
and combine multiple data generation strategies to lift previous limitations. 
We aim for a dataset containing specifications (assumptions and guaran- 
tees) and circuits. Depending on the specification, the circuit is either a winning 
strategy for the system (realizable) or a winning strategy for the environment 
(unrealizable). For each sample, we use an additional token to show whether the 
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system is realizable or unrealizable. The dataset is for supervised training, with 
the specification being the input and the circuit, along with the realizability 
token being the model's target. 

In total, we combine three datasets and generation techniques. For the first 
two datasets, we utilize the generation method from [57] with 1) the minor tweak 
of having a variable number of inputs and outputs (up to five) in the circuit 
instead of exactly five (denoted previous), and 2) extensions to handle a larger 
number of patterns, larger patterns, and patterns with more atomic propositions 
(denoted new). The third dataset is a data augmentation method based on the 
result of new (denoted augmented). 


Data Generation. We first report on the data generation algorithm for previous 
and new. The data generation has two major steps. In the first step, we mined 
LTL formula patterns that are common in research and practice. Considering 
formula patterns is a widespread idea, e.g., [24]. We collect patterns from 1075 
(previously 346) benchmarks from the LTL synthesis track of SYNTCOMP 2022. 
We extract a list of 627 assumption patterns and 7948 guarantee patterns. An 
assumption restricts the environment, and a guarantee defines the implemen- 
tation's behavior. To fit the model requirements, we filtered out LTL formulas 
with more than 15 inputs and 15 outputs (previously 5). Additionally, we fil- 
ter out specifications with an abstract syntax tree (AST) size greater than 30 
(formerly 25), resulting in 519 (formerly 157) assumption patterns and 6841 
(previously 1942) guarantee patterns. In the second step, we constructed syn- 
thesis specifications by combining the mined patterns. For each specification, 
we alternate between sampling guarantees until the specification becomes un- 
realizable, and sampling assumptions until the specification becomes realizable. 
Whether we aim for a realizable or unrealizable specification, we either collect 
the last successfully mined specification (realizable) or the second-to-last mined 
specification (unrealizable). We aim for an even split between realizable and 
unrealizable specifications. To handle more atomic propositions while reducing 
patterns that do not share atomic propositions, we now favor atomic proposi- 
tions present in the already constructed part of the specification with a bias of 
4 when instantiating the patterns. We continue this process until we reach one 
of the following stopping criteria: 


a) the specification has the maximal number of guarantees (10), 

b) the specification has the maximal number of assumptions (3), 
c) the synthesis tool timed out (120s timeout), or 

d) no suitable assumption was found after 7 (formerly 5) attempts. 


To ensure an even distribution of challenging instances, we filter AIGER 
circuits exceeding a maximum variable index of 60 and only allow a certain 
amount (2096) of circuits with the same number of AND gates. 


Data Augmentation. We augment the dataset new as a third approach to artifi- 
cially force larger properties for a share of the final dataset. For each specification 
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Fig. 4. Previous dataset[57], compared with the new final dataset. Comparing the 
number of atomic propositions in a sample, the largest variable id in the AIGER 
circuit, and the average size of properties. 


in new, we combine multiple patterns into one property until we reach an AST 
size of 30. Having longer properties in the training dataset leads to better gen- 
eralization to even larger properties. Compared to new, the augmented dataset 
has an average of 3 guarantees instead of 5.6, with an average size of 22.9 per 
guarantee instead of 12.3. 


Final Dataset. All three resulting datasets are combined into a single dataset, 
consisting of 600000 training samples and 75000 validation and test samples. 
Figure 4, shows the key differences in features of the new final dataset compared 
to the previous dataset [57]. While the previous dataset used only up to 5 inputs 
and outputs in the specification, we now have up to 15 inputs and outputs, 
leading to up to 25 atomic propositions in a specification. We also have slightly 
more latches in the new dataset (1.23 instead of previously 1.16). Note that the 
same version and configuration of Strix [46] was used in both approaches. The 
most apparent distinction to previous datasets[57] is in the size of the properties, 
where we clearly see the effects of the data augmentation process. 


4.2 Architecture & Training 


Transformer Architecture. The core of the neural solver implemented in 
NeuroSynt is a Transformer neural network [63]. The vanilla Transformer ar- 
chitecture follows a basic encoder-decoder structure. The encoder constructs 
a hidden embedding z; for each input embedding x; of the input sequence 
x = (xo,...,c4) in parallel. An embedding is a mapping from plain input, for 
example, words or characters, to a high dimensional vector, for which learn- 
ing algorithms and toolkits exist, e.g., word2vec [47]. Given the encoders out- 
put z = (20,...,2&), the decoder generates a sequence of output embeddings 
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Fig. 5. Schematic view of the Hierarchical Transformer, with illustrated inputs/outputs 
of the reactive synthesis application. The encoder shows the hierarchical self-attention 
with separation into local and global layers. For simplicity, we show one local and global 
layer and only two assumptions and guarantees with two tokens each. 


y = (yo,..., Ym) autoregressively. Since the transformer architecture contains 
no recurrence nor convolution, we apply a tree positional encoding [60]. 

The main idea of the Transformer is a self-attention mechanism to compute 
a score for each pair of input elements, representing which positions in the se- 
quence should be considered the most when computing the hidden embeddings. 
For each input embedding x;, we compute 1) a query vector q;, 2) a key vector 
ki, and 3) a value vector v; by multiplying x; with weight matrices Wp, W,, and 
W,, which are learned during the training process. The embeddings can be cal- 
culated simultaneously using matrix operations [63]. Specifically, let Q, K, V be 
the matrices obtained by multiplying the input vector X consisting of all z; with 


the weight matrices Wi, W,, and W,: Attention(Q, K, V) = softmaz(9&— V, 
with dj being the model’s dimension. For details, we refer the interested reader 
to [63]. The Transformer variation used in this paper is a so-called hierarchical 
Transformer [44], separating the encoder self-attention into local and global lay- 
ers. Local layers embed assumptions and guarantees individually and invariant 
against their order. Global layers calculate the self-attention across all assump- 


tions and all guarantees. We show an illustration in Figure 5. 


Model Hyperparameter & Training. We train our model on the 600 000 samples 
from our training dataset for 80000 steps with early stopping and a batch size 
of 512. We show a plot of the accuracy per sequence in Figure 6. We train data 
parallel on two Nvidia A100 40GB from a Nvidia DGX A100 system, which 
takes approximately 10 hours. We use the Adam optimizer [40] with 6, = 0.9, 
B3 = 0.98 and e = 107°. We use learning rate scheduling as proposed in [63] 
with 4000 warmup-steps. Our model consists of 4 local, 4 global, and 8 decoder 
layers, each having 4 heads. All feed-forward networks have 1024 nodes to which 
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Fig. 6. Accuracy per sequence during training. Measured on training and validation 
data. 


we apply a dropout of 0.2. Our model has a total size of 14 791 748 parameters. 
Input and output tokens have an embedding of size 256. The maximum input and 
target lengths are set according to the training data with at most 12 properties, 
a maximum AST size of 32 per property for the specification, and a maximum 
circuit length of 128 tokens after encoding. 

We show that our model significantly improved compared to previous work 
[57] by reimplementing and adapting the previous model to evaluate the 2022 
SYNTCOMP benchmarks. With 21.896, the new model improved by 13 percent- 
age points to 34.8396. We explore more details of the evaluation of the model in 
Section 5.1. 


5 Experiments & Benchmarks 


We split our experiments into two segments. In Section 5.1, we first perform 
generalization experiments on the integrated neural solver. The neural solver 
can generalize on its training distribution but also to more complex instances, 
longer specifications, and out-of-distribution instances, which we show using the 
datasets test, large, timeouts, and syntcomp. 

Secondly, in Section 5.2, we evaluate the performance of the NeuroSynt frame- 
work on the SYNTCOMP 2022 benchmarks. To this end, we use NeuroSynt to 
compare the performance of the neural solver against multiple symbolic solvers 
and highlight efficiency gains and enhancements that arise from the combination 
of both methodologies. We show that the combined effort of neural and sym- 
bolic solvers leads to a performance gain that symbolic solvers alone could not 
achieve. 

The evaluation is performed on a GPU cluster (1 Nvidia DGX A100 40GB, 
AMD EPYC 7F32 @ 1.8GHz base, 3.7GHz max, 8 cores + 8 SMT cores, 256GB 
RAM), on a CPU cluster (Intel Xeon E7-8867 v3 @ 2.50GHz, 64 cores + 64 HT 
cores, and 1536 GB RAM) and additionally did some early experiments on an 
Apple M1 Max (64GB memory, 10 cores, 32 neural cores). 

Similar to different configurations of symbolic solvers, we have multiple mod- 
els with slightly different performances. This paper reports the results of the 
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model that performed best on the SYNTCOMP benchmark. Whenever we con- 
sider additional models, we mention that explicitly. 


5.1 Generalization 


We analyze the generalization capabilities of the model in the neural solver in 
four ways. Firstly on our test set, secondly on samples that are significantly 
larger than seen during training (large), thirdly samples that are arguably 
more difficult than training samples, and fourthly on out-of-distribution sam- 
ples (syntcomp). Here, we consider instances that are not from the same data 
generation algorithm out-of-distribution samples. Results on these datasets are 
in Table 1. 


Generalization on test and large. On the datasets test and large, addition- 
ally to measuring correct solutions (semantic accuracy, 84.2%), we collect how 
many solutions are syntactically identical to the solution from our data gener- 
ation algorithm (38.6%). The large difference of 45.6 percentage points on our 
test dataset indicates that the neural solver generalizes to the semantics of the 
synthesis problem instead of learning the particularities of the data generator. 

The dataset large consists of larger samples than seen during training. Sam- 
ples in large have at least 10, on average 14.5 properties, and the largest prop- 
erty in each sample has an AST of 37.9 on average. In contrast, training samples 
have 5.3 properties on average, with the largest property having an AST of 22.2 
on average. 

For a more detailed analysis, we join datasets test and large and plot the 
share of correct solutions partitioned by the number of properties as well as the 
size of the largest property in each sample in Figure 7. The largest property seen 
during training is 30, and the largest number of properties per specification is 
12. While we see a decrease in performance for larger samples, there is no clear 
drop after 12 or 30, respectively, which indicates generalization with the number 
of properties and the length of the properties. Note that results from larger sizes 
naturally have less significance as fewer samples per bucket exist. We refer to 
the full version [20] for more details on the count of samples in each displayed 
bucket. 


Table 1. Performance of the neural model on different datasets. 


syntcomp- syntcomp- syntcomp- 


test large timeouts mall large full 
syntactic 38.6% 10.2% - n = - 
accuracy 
semantic 44.2% 57.7% 33% 65.8% 54.5% 34.83% 


accuracy 
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Fig. 7. Share of correct solutions on the joint dataset of large and test over the 
number of properties in a sample and the size of the largest property in each sample. 
A darker background indicates sizes larger than seen during training. 


Generalization on timeouts. The dataset timeouts consists of samples on which 
Strix timed out after 120s during our data generation. Therefore, such samples 
can be seen as significantly harder, while not larger than samples in the training 
data. We achieve 3396 correct solutions on this dataset, showing that our model 
generalizes from the training data to more challenging specifications and solu- 
tions that could not have not been solved by Strix during the data generation. 
'This experiment prognosticates the potential of combining neural methods with 
symbolic methods. 


Generalization on out-of-distribution dataset. While large and timeouts were 
generated with the same data generation approach as the training data, 
syntcomp-full consists of all 1075 real-world specifications collected in the 
SYNTCOMP benchmark, on which the neural solver achieves 34.83% accuracy 
(see Table 1). syntcomp-large contains all such samples that are in the size 
of our evaluation constraints (i.e. max 30 properties, max AST size of 70 per 
property, 54.596 accuracy). syntcomp-small contains only such samples that are 
in the training data size (i.e., max 12 properties, max AST size of 30 per prop- 
erty, 65.896 accuracy). We see a remarkable generalization to out-of-distribution 
samples with an accuracy of 64.896 on syntcomp-small. We additionally observe 
generalization on specification size that we also see on the large dataset. 
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Fig. 8. A cactus plot showing the number of solved samples vs. accumulated wall-clock 
time. Each sample per solver is a dot on the respective line. The lower and further 
right a line, the better the solver. We compare the neural solver, Strix and BoSy 
alone, NeuroSyntposy which couples BoSy and the neural solver and NeuroSyntstrix 
which couples the neural solver and Strix. Further, we virtually combine the results 
of all tools and all configurations from the SYNTCOMP to a virtual best solver and 
compare that with the evaluations of multiple neural models. 


5.2 Comparisons and Advantages of Combination 


We demonstrate the advantage of NeuroSynt by comparing the neural solver 
to the performance of multiple symbolic solvers: Strix [46], the current state-of- 
the-art, BoSy [27], a bounded synthesis method, and additionally rely on the 
results of SYNTCOMP 2022 (Itlsynt [55], Otus [1], sdf [38]). Whenever we write 
SYNTCOMP in this paper, we refer to the 2022 iteration. 

We initiate the evaluation by comparing the neural solver and NeuroSynt to 
the specified symbolic tools, illustrating the number of problems that can be 
solved within a specific time frame. Then we dive into details on instances that 
could only be solved by NeuroSynt and no other symbolic solver (novel solves), 
show details on the time-to-solution differences between the solvers, and lastly, 
look at circuit sizes of their respective solutions. 


In Figure 8, we display the performance of the neural solver, the perfor- 
mance of several symbolic tools and the performance of NeuroSynt that unites 
the neural solver with a symbolic solver. We additionally show a Virtual Best 
Solver (VBS) of all SYNTCOMP 2022 results without and including the neural 
solver. We further report what the previously published neural reactive synthesis 
approach [57] would have achieved if it had been integrated into the portfolio 
solver. With 374 solved instances, the neural solver alone can already solve more 
samples than BoSy (347) with 120s timeout on the CPU cluster. Its true advan- 
tage becomes evident when combining the neural solver with symbolic solvers. 
NeuroSyntp,s, solves 152 (previous: 59) samples more than BoSy alone, which 
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is 20.8% of the samples that BoSy could not solve. Similarly, NeuroSyntgtrix 
solves 31 (previous: 2) samples more than Strix alone (1h timeout on the CPU 
cluster), which is 14.2% of the samples that Strix could not solve in 1h. To show 
the full potential of NeuroSynt, we combined all results from the SYNTCOMP 
and our experiments with BoSy and Strix. All symbolic solvers combined were 
able to solve 945 instances of the total of 1075. Adding the neural solver of 
NeuroSynt to the virtual best solver, we solve an additional 20 (previous: 0) 
samples exclusively that no other tool tested did solve (novel solves). This is 
15.4% of the samples that none of the symbolic tools could solve in 1h. No other 
tool in SYNTCOMP 2022 except the state-of-the-art Strix, solved more samples 
that no other tool could solve. We refer to the full version [20] for exact numbers. 
This signifies that even for specifications that pose computational challenges to 
symbolic synthesis tools, there exist patterns that a neural network can recognize 
and exploit post-training. 


Novel Solves. Of the 20 novel solves, 6 instances are parameterized versions of 
full arbiters with 3 processes. This version of the full arbiter is unrealizable as 
the specification additionally enforces two grants to hold at the same time step 
(step 11 to step 16 respectively). These are the largest parameterizations of this 
problem class in the SYNTCOMP dataset. Similarly, 11 instances are full arbiter 
with 3 processes, where two grants are enforced simultaneously (step 6 to 16, 
respectively). These parameterizations are also the largest parameterizations of 
this problem class in the SYNTCOMP dataset. One instance is a full arbiter 
with 6 processes and the requirement of two grants to hold at any time step. 
Finally, we have one instance of a load balancer with 6 grants and the additional 
unrealizable requirement of two grants at time step 5. This is also the largest 
parameterization of this problem class in the SYNTCOMP dataset. For examples 
of the novel solves, we refer the reader to the artifact or the full version [20] 


Time To Solution. For experiments with NeuroSynt, we record the wall-clock 
time of the neural solver, the symbolic solver, and the model checker. The neural 
solver (including model checking) is fastest on the GPU cluster, with 8.6s and 
a standard deviation of only 3.3s. The time for model-checking using NuXmv is 
almost negligible, with 0.355 on average per sample. The low standard deviation 
highlights the advantage of the neural solver, as the time does not depend on the 
complexity of the specification. Strix with a timeout of 1h on the CPU cluster 
takes 33.4s on average, with a standard deviation of 185.3s. We find that the 
neural solver can also be run on CPU-only hardware (CPU cluster) with an 
average of 79.4s and on hybrid desktop hardware such as the Apple M1 Max 
with an average of 17.8s. For an extensive overview over the experiments with 
different timeouts, we refer the reader to the full version [20]. 


Circuit Sizes. We find that on instances where the neural solver and the symbolic 
solver both found a solution, the solution by the neural solver is often smaller 
than the symbolic solver's. T'his holds for BoSy and Strix, but also for all other 
tools in SYNTCOMP (on the realizable fraction). On samples solved by Strix 
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Fig.9. No. of latches per instance. On instances that the neural solver and Strix 
commonly solved 


and the neural solver, the solutions by the neural solver have 54.996 fewer latches 
than those by Strix. In Figure 9, we show the distribution of latches for this 
comparison. For more details, we refer the reader to the full version [20]. 


6 Conclusion 


We introduced NeuroSynt, a neuro-symbolic portfolio solver for reactive synthe- 
sis. At the core of the portfolio solver lies an integrated neural solver that com- 
putes candidate implementations, which are automatically checked by model- 
checking tools. We reported on the neural solver's methodology and training and 
the API framework's implementation to isolate components. The open-source 
implementation of NeuroSynt provides an interface in which new neural and 
symbolic approaches alike can be seamlessly integrated. 

Our experiments on the generalization capabilities of the Transformer show 
the ability to generalize to larger specifications, more difficult specifications, 
and out-of-distribution specifications. The relatively small size of the underly- 
ing Transformer neural network suggests that the overall performance of neural 
solvers can be further increased. 

We evaluated the overall performance of NeuroSynt, enhancing the state- 
of-the-art in reactive synthesis with the integrated neural solver contributing 
novel solves in the SYNTCOMP 2022 benchmark. With the almost constant 
evaluation time of the neural solver, the portfolio solver is often faster than 
previous approaches. Furthermore, the integrated neural solver yields smaller 
implementations than state-of-the-art symbolic tools, including Strix and BoSy. 


7 Data Availability Statement 


NeuroSynt is published open-source on GitHub (https://github.com/react 
ive-systems/neurosynt). All data, models, and experiments supporting this 
paper's results are publicly available. A digital artifact is available at (https: 
/ /doi.org/10.5281 /zenodo.10046523). 
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Abstract. The HALIVER tool integrates deductive verification into the 
popular scheduling language HALIDE, used for image processing pipelines 
and array computations. HALIVER uses VERCORS, a separation logic- 
based verifier, to verify the correctness of (1) the HALIDE algorithms and 
(2) the optimised parallel code produced by HALIDE when an optimisa- 
tion schedule is applied to an algorithm. This allows proving complex, 
optimised code correct while reducing the effort to provide the required 
verification annotations. For both approaches, the same specification is 
used. We evaluated the tool on several optimised programs generated 
from characteristic HALIDE algorithms, using all but one of the essen- 
tial scheduling directives available in HALIDE. Without annotation effort, 
HALIVER proves memory safety in almost all programs. With annota- 
tions HALIVER, additionally, proves functional correctness properties. 
We show that the approach is viable and reduces the manual annotation 
effort by an order of magnitude. 


Keywords: Program correctness - Deductive verification - Scheduling 
language. 


1 Introduction 


'To meet the continuously growing demands on software performance, parallelism 
is increasingly often needed [13]. However, introducing parallelism tends to in- 
crease the risk of introducing errors, as the interactions between parallel compu- 
tations can be hard to predict. Moreover, a plethora of optimisation techniques 
exists [10], so identifying when an optimisation can be applied safely, without 
breaking correctness, can be very challenging. Also, applying optimisations tends 
to make a program more complex, making it harder to reason about. 

To address this, on the one hand, various domain-specific languages (DSLs) 
have been proposed that separate the algorithm (what it does) from the par- 
allelisation schedule (how it does it). These are called scheduling languages 
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Fig. 1. High level overview of our approach. 


[eH8]22]23]28]. Given an algorithm and a schedule, a compiler generates an op- 


timised parallel program. This approach crucially depends on the schedule not 
introducing any errors in the functionality, which is not always obvious. 

On the other hand, deductive program verification |9| has been successfully 
applied to verify the functionality of parallel programs |4|. This requires that the 
intended functionality is formalised as a contract, for instance using permission- 
based separation logic [15]. A major hurdle, preventing this technique from being 
adopted at a large scale, is that if a program becomes more complicated, the 
required annotations rapidly grow in size and complexity [25]26]. 

In this paper, we combine the best of both worlds. We propose the HALIVER 
tool, which focusses on HALIDE [22]23], a scheduling language for portable image 
computations and array processing. It has been widely adopted in industry, for 
instance to produce parts of Adobe Photoshop and to implement the YouTube 
video-ingestion pipeline. For verification, we use the VERCORS program veri- 
fier (4). In this paper we define two verification approaches (1) front-end and (2) 
back-end, as seen in Figure|1] Our approaches verify that the program adheres 
to the same functional specification. This specification is detailed by annotat- 
ing the algorithmic part of a HALIDE program, thereby keeping the annotations 
focussed on the functionality, and therefore relatively straightforward. With the 
front-end verification approach we verify the correctness of the algorithmic part 
of a HALIDE program. HALIVER transforms the algorithm and the annotations 
to an annotated VERCORS program. With back-end verification approach we 
verify the C code that the HALIDE compiler generates, given a HALIDE algo- 
rithm and a schedule. HALIVER transforms the given annotations to match the 
generated code. Furthermore, where possible, HALIVER generates annotations, 
such as permission specifications, to relieve the user from having to manually 
write these. This contributes to making the annotation process straightforward. 

In this way, HALIVER allows the user to succinctly specify the intended 
functionality of optimised, parallel code, and it checks that the resulting program 
indeed has the desired functionality. A major advantage of our approach is that 
it is flexible to use in a setting where multiple compiler passes are made. Also, it 
can be easily extended if a new compiler pass or schedule optimisation is added. 
An alternative would be to prove correctness of the compiler, but this would 
require a large amount of initial work and additionally for each change to the 
compiler. 


Concretely, this paper provides the following contributions: 
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Listing 1. HALIDE blur example with annotations added to verify the code. 


1 requires inp.x.min == blur y.x.min A inp.x.max == blur y.x.maxt2 A inp.y.min == 
blur y.y.min ^ inp.y.max == blur. y.y.max*2; 

2 ensures V x, y . blur_y.x.min<x<blur_y.x.max ^ blur_y.y.min<y<blur_y.y.max => 
blur y(x,y) == ((inp(x,y)*inp(x*1,y)*inp(x*2,y))/3 + (inp(x,y*1) +inp(x+1,y+1)+ 


inp(x*2,y*1))/3 + Cinp(x,y+2)+inp(x+1,y+2) + inp(x*2,y42))/3)/3); 


3 | void blur(Buffer<2,int> inp, Func &blur_y){ 

4 Func blur_x; Var x, y; 

5 blur_x(x,y) = (inp(x,y) + inp(x+1,y) + inp(x+2,y))/3 

6 blur x.ensures(blur x(x,y) == (inp(x,y) + inp(x+1,y) + inp(x+2,y))/3); 

7 blur_y(x,y) = (blur_x(x,y) + blur_x(x,y+1) + blur_x(x,y+2))/3; 

8 blur_y.ensures(blur_y(x,y) == ((inp(x,y)+inp(x+1,y)+inp(x+2,y))/3 + (inp(x,y+1)+inp 


(xt1,y+1) + inp(x+2,y+1))/3 + (inp(x,y+2)+inp(x+1,y+2)+inp(x+2,y+2))/3)/3;} 


— An annotation language to describe the functionality of HALIDE algorithms, 
which is integrated into the HALIDE algorithm language; 

— Tool support for the front-end verification approach of HALIDE algorithms; 

— Tool support for the back-end verification approach, which can verify pro- 
grams generated by the HALIDE compiler from an algorithm and a schedule; 

— Evaluation of the HALIVER tool on HALIDE examples using all but one 
of the essential scheduling directives available in various combinations. We 
evaluated the tool on 23 different optimised programs, generated from eight 
characteristic HALIDE algorithms, to prove memory safety with no annota- 
tion effort. For 21 cases, HALIVER proves safety, for the remaining two cases 
we discuss the limitations. For 20 programs, based on five algorithms, we 
also add annotations for functional correctness properties. For 19 of these 

programs HALIVER proves correctness, for the remaining one we run into a 

similar limitation. 

The remainder of this paper is organised as follows. Section |2| gives brief 
background information on HALIDE and VERCORS. Section BJintroduces HALIDE 
annotations, and describes how HALIVER supports the verification of an algo- 
rithm and an optimised program. The approach is illustrated on characteristic 
examples. Section [1] evaluates the HALIVER tool, and Sections b] and [6] address 
related work, conclusions and future work. 


2 Background 


HALIDE. HALIDE is a DSL embedded in C++, targeting image processing 
pipelines and array computations HALIDE separates the algorithm, 
defining what you want to calculate, from the schedule, defining how the cal- 
culation should be performed. Typically, when optimising code for a specific 
architecture, the code becomes much more complex and loses portability. By 
separating the schedule, the code expressing the functionality is not altered. 
Listing [1] presents the HALIDE algorithm for a box filter, or blur function. 
'The reader can ignore the requires and ensures annotations for now. Images 


3 A HALIDE tutorial can be found here: |https:/ /halide-lang.org/tutorials/ 
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Listing 2. A reduction to count the positive numbers of each row in matrix inp. 


void cnt(Buffer<2,int> inp, Func count) { 
Var x; RDom r(0,10); 
count(x) = 0; 
count.ensures(count(x) == 0); 
count(x) = select(inp(x, r) > 0, count(x)+1, count(x)) 
count.invariant (0<count (x) <r); 
count .ensures(0< count (x) <10) ;} 


NOUR WNH 


are represented as pure (side-effect free) functions that point-wise map coordi- 
nates to values. A blur function defines how every pixel, referred to by its two- 
dimensional coordinates, should be updated. In the example, the coordinates are 
represented by the variables x and y. HALIDE uses a functional style, allowing 
algorithms to be compact and loop-free. HALIDE functions are denoted by the 
keyword Func. In the example, the input image is stored in a two-dimensional 
integer buffer inp, and the output is given by defining the function blur_y, a 
reference to which is a parameter of blur. A pipeline of function calls is defined: 
the function blur_x is applied on the input image (line 5). The output of that 
function is used to compute the final image with the function blur_y (line 7). 
With inp.x.min and inp.x.max we refer to the minimum and maximum value 
of the dimension inp.x, respectively. 

A function may involve update definitions, which (partially) update the value 
of a function. A reduction domain is a way to apply an update a finite number of 
times and is typically used to express sums or histograms in HALIDE. A function 
is called a reduction when such a domain is used, and an initialisation and an 
update definition are given. Listing [2] presents a reduction example. For now, 
ignore the ensures and invariant lines. The reduction domain (RDom) r ranges 
from 0 to 9, i.e. it consists of 10 values. The initial value of the count function is 
defined at line 3, and line 5 is executed once for every value in r. The statement 
select(a,b,c) returns b if a evaluates to true, c otherwise. For a given matrix 
of integers inp, cnt counts the number of non-zeros at the first ten positions of 
each row in inp. 


A HALIDE schedule is given in Listing [b]and further explained in Section B.3] 


VERCORS. VrRCond/!] is a deductive verifier to verify the functional cor- 
rectness of, possibly concurrent, software. Its specification language uses permis- 
sion-based separation logic [5]. a combination of first-order logic and read/write 
permissions. The latter are used for concurrency-related verification, to express 
which data can be accessed by a thread at which moment. Programs written 
in a number of languages, such as JAVA and C, can be verified. VERCORS also 
has its own language, PVL. VERCORS's verification engine relies on VIPER [16], 
which applies symbolic execution to analyse programs with persistent mutable 
state. 


^ An online tutorial can be found at |https:/ /vercors.ewi.utwente.nl/ wiki / 
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Intended functional behaviour can be specified by means of pre- and post- 
conditions, indicated by the keywords requires and ensures, respectively. The 
statement context P is an abbreviation for requires P; ensures P. Loop in- 
variants and assertions can be added to the code to help VERCORS in proving 
the pre- and postconditions. We refer to the pre- and postconditions, loop invari- 
ants and assertions together as the annotations of a code fragment. A permission 
Perm(x, f) gives permission to memory location x, where f is a fractional, with 
1M. indicating a write and anything between O\1 and 1M a read. For a state- 
ment s, we have the Hoare triple (P }s{Q}. This indicates that if P holds in the 
pre-state then after s, Q holds in the post-state. A pure function is without side- 
effects, thus can be used in annotations. It has the keyword pure in the header, 
and its body is a single expression. Annotations and pure function definitions 
in C files are given in special comments, like //@ or /*@...@*/ for multi-line 
comments. (See Listing [6] for examples.) 

VERCORS can prove termination of recursive functions. Whenever the clause 
decreases r is added to a function contract, VERCORS will try to prove that 
the function terminates, by showing that all recursive calls will strictly decrease 
the value of r while r has a lower bound. 


3 Verification of Scheduling Languages with HALIVER 


HALIVER works directly on a HALIDE program and its intermediate representa- 
tions, adding and transforming annotations where necessary. The tool is embed- 
ded in the HALIDE compiler. From a user's point of view, the general approach 
is as follows, using the front-end and back-end approach as in Figure 


1. Write a HALIDE algorithm and add annotations. Annotations are 
the functional specification of the HALIDE algorithm. Since a user can write 
an incorrect HALIDE algorithm, its correctness is ideally checked against a 
user-supplied specification. 

2. The front-end approach produces a PvL encoding. This encoding 
contains the algorithm and the specified annotations. 

3. VERCORS verifies the encoding. If verification succeeds, we know that 
the front-end algorithm conforms to the functional specification. Otherwise, 
the verification fails; VERCORS produces a counterexample and we return 
to step 1. 
Write a HALIDE schedule. 
The back-end approach produces an annotated C file. The tool au- 
tomatically generates permission annotations. These allow us to prove data- 
race freedom and the absence of out-of-bound errors. The tool transforms 
the annotations and generates additional annotations to match the scheduled 
back-end code. This is highly non-trivial, as each for-loop requires precise 
annotations to guide VERCORS in the verification. However, it is ensured 
that the same property is verified. 

6. VERCORS verifies the back-end C file. If the verification fails, the lines 
of C code that caused the failure are given, which can be traced back to the 
HALIDE algorithm. The cause of a verification failure may be that 


ve 
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— The HALIDE compiler produced incorrect code w.r.t. the specifications. 

— More auxiliary annotations from step 1 are needed to guide VERCORS. 

— A limitation has been encounter of the tools HALIVER relies on, e.g., 
VERCORS or the underlying SMT solver. 


In the remainder of this section we explain how to write annotations, and ad- 
dress front-end and back-end verification approaches. We also discuss the sound- 
ness and current limitations of the technique. 


3.1 HALIDE Annotations 


HALIVER makes it possible to add annotations when writing a HALIDE algo- 
rithm. Intuitively, these annotations are added as a Hoare triple. We consider 
three types of annotation: pipeline, intermediate and reduction invariant anno- 
tations. 

In Listing [1] annotations have been added. The lines 1-2 are pipeline anno- 
tations: they specify the pre- and postconditions of the whole function and can 
only contain references to input buffers or output functions. Note that the re- 
sults are stored directly in the blur. y function. Line 1 specifies how the input 
and output bounds should be related. Line 2 indicates what the output values 
are. One can add intermediate annotations after any (update) function call to 
specify state predicates for particular locations in the pipeline. Examples are the 
blur, x.ensures and blur_y.ensures state predicates of Listing [L] (lines 6 and 
8). 

HALIDE functions map coordinates to values pointwise. To achieve a one-to- 
one relationship between function and annotations, the intermediate annotations 
for a function should also specify how coordinates relate to values pointwise. 
However, input buffers can be used freely with any point. For example, blur x 
.ensures (blur_x(x,y)>inp(xt+1,y)) is valid, but blur x.ensures(blur x( 
x*1,y)20) is not, because the latter refers to blur x(x*1,y) as opposed to 
blur, x(x,y). HALIVER requires this because each point of the function may be 
computed in parallel in the back-end, so it must be possible to reason about the 
points individually. 

For ease of annotation, HALIVER automatically generates a pipeline post- 
condition. This postcondition is derived from the intermediate annotation of the 
last pipeline function in the algorithm. For Listing [1] HALIVER can generate line 
2, which is included here for completeness, based on line 8. 

To prove that a reduction is correct, reduction invariant annotations must be 
provided for reduction domains. In Listing P] an example is given of a reduction 
(line 5) together with its reduction invariant (line 6) and post-state predicate 
(line 7). Intuitively, a reduction invariant is similar to a loop invariant. First, it 
must hold before the reduction starts. In our example this means that count (x) 
has the value 0, which is ensured by the previous definition of count (line 4). 
Second, it must be preserved by each step of the reduction. In our example, count 
is bounded by the reduction variable. Finally, after each reduction variable has 
reached its maximum value, the reduction invariant should imply the post-state 
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Listing 3. The front-end Pvt code for the blur example (Listing [1]. We omitted the 
decreases clauses for brevity. 


pure int inp(int x, int y); 

pure int inp_x_min(); pure int inp_x_max(); pure int inp_y_min(); pure int inp_y_max(); 
pure int blur_y_x_min(); pure int blur_y_x_max(); 

pure int blur_y_y_min(); pure int blur y y maxO; 


ensures \result = (inp(x, y) + inp(x*i, y) + inp(x+2, y))/3; 
pure int blur x(int x, int y) = (inp(x, y) + inp(xtl, y) + inp(xt2, y))/3; 


oono buone 


ensures \result = ((inp(x, y) + inp(x+1, y) + inp(x+2, y))/3 

10 + (inp(x, y+1) + inp(x*i, y*1) + inp(x+2, y+1))/3 

11 + (inp(x,y*2) + inp(x*i,y*2) + inp(x+2,y+2))/3)/3; 

12 | pure int blur_y(int x, int y) = (blur x(x, y) + blur_x(x, yti) + blur x(x, y+2))/3; 


14 | requires inp_x_min() = blur_y_x_min() A inp_x_max() = blur_y_x_max()+2 

15 ^ inp_y_min() = blur_y_y_min() ^ inp_y_max() = blur_y_y_max()+2; 

16 ensures (V x, y; blur_y_x_min()<x A x<blur_y_x_max() ^ blur y y ninO Xy ^ y< 
blur. y.y.maxO ; 

17 blur.y(x,y) = ((Gnp(x, y) + inp(xt1, y) + inp(x+2, y))/3 

18 + (inp(x, y*1) + inp(x*i, y*1) + inp(x+2, y*1))/3 

19 + Cinp(x, y+2) + inp(xti, y#2) + inp(xt2, y+2))/3)/3): 

20 | void pipeline() { } 


predicate of the function. For the example, note that the invariant implies the 
post-state predicate when r has reached the value 10. The actual used value goes 
to 9, and r==10 indicates that the reduction is done. 


3.2 Front-end Verification Approach 


For verifying the algorithm part of a HALIDE program, an annotated HALIDE 
algorithm is encoded into annotated PvL code. Listings |3| and [4| show how 
HALIVER translates the examples of Listings Eana [] respectively. Input buffers 
are translated into abstract functions to verify the pipeline w.r.t. arbitrary in- 
put. The bounds of input buffers and functions are modelled via functions that 
are abstract if the bound is unknown or otherwise return a concrete value. For 
example, the inp buffer of the blur example is translated to a function inp in 
Listing pl (line 1), with its bounds represented by the pure functions on line 2. 

Update-free HALIDE functions are translated directly into pure PvL func- 
tions, and post-state predicates are translated into postconditions of these func- 
tions. In the example, blur_x and blur_y are translated to the functions on 
lines 6-7 and 9-12 of Listing |3| respectively, and the ensures lines express the 
postconditions of those functions, using \result to refer to the expected result. 

The pre- and postconditions of a HALIDE algorithm are translated into a 
PvL lemma to be checked by VERCORS. In the example, lines 14-19 of Listing B] 
address the pre- and postconditions on lines 1-2 of Listing On line 20, a 
method called pipeline is given, which represents the HALIDE pipeline. 

For an update definition, references to itself are replaced by references to the 
previous definition, thus the output of one definition is the input of the next. 
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Listing 4. The front-end Pvt code for the reduction example of Listing P] 


decreases; 
pure int inp(int x, int y); 
decreases; 
pure int inp_x_min(); pure int inp_x_max(); pure int inp_y_min(); pure int inp_y_max(); 


ensures \result = 0; 
decreases; 
pure int countO(int x) = 0; 


COANDoOBRWNEH 


10 requires OXr A r<10; 
11 ensures (0<\result ^ \result<r); 


12 decreases r; 

13 | pure int countir(int x, int r) = r = 0 ? count0(x) 

14 : inp(x, r-1) > O ? countir(x, r-1) + 1 : countir(x, r-1); 
15 

16 ensures (0<\result ^ \result<10); 

17 decreases; 


18 | pure int count(int x) = countir(x, 10); 


For a reduction, the initialisation and update parts are translated into sepa- 
rate functions, and reduction domain variables are explicitly added as function 
parameters. Listing 4] illustrates this for the cnt example. The function countO 
on line 8 corresponds to the initialisation (line 3 of Listing 2}, with the translated 
post-state predicate on line 6. The function countir (lines 13-14) corresponds 
to the update function (line 5 of Listing p). Note that the annotation on line 
10 refers to the reduction domain. The reason for using references to r-1 on 
line 14 is that the result of the whole computation corresponds to r with its 
maximum value 10 (see line 18). This is computed by recursively decrementing 
r. The invariant on line 6 of Listing Plis translated into the postcondition of 
countir (line 11), reflecting that the invariant should hold after each reduction 
iteration. For the decreases r annotation added on line 12, VERCORS will try 
to prove that this recursive function terminates. The reduction postcondition is 
represented by the ensures annotation on line 16. 


Guarantees. For the front-end verification approach, HALIVER straightfor- 
wardly encodes a HALIDE function without reductions, as it defines the function 
pointwise in PVL. For reductions, HALIVER mimics the iterative updates with 
recursion, as shown in the cnt example of Listings |2| and |4| HALIVER adds 
decreases clauses to check that the recursive functions terminate. 

With HALIVER's approach, functional correctness of the algorithm part can 
be proven. Since memory safety depends on how a HALIDE algorithm is compiled 
into actual code according to a schedule, this is checked using the back-end 
verification approach. 


3.3 Back-end Verification Approach 


For verifying a HALIDE algorithm with a schedule, HALIVER adds annotations 
to the generated C code that can be checked by VERCORS. First, HALIVER gen- 
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Listing 5. A schedule for the blur example (Listing), together with the loop nest the 
HALIDE compiler produces, given in the intermediate representation of HALIDE. The 
blur_y bounds are assumed to be from 0 up to 1,024 for dimensions x and y. 


1 |blur.y.split(y, yo, yi, 8).parallel(yo).split(x, xo, xi, 2).unroll(xi); 
2 |blur x.store at(blur y, yo).compute at(blur.y, yi).split(x, xo, xi, 2).unroll(xi); 
3 Below is the loop nest produced (not part of the schedule) 

4 | produce blur. y: 

5 parallel y.yo in [0, 127]: 

6 store blur_x: 

7 for y.yi in [0, 7]: 

8 produce blur_x: 

9 for y: 

10 for x.xo in [0, 511]: 

11 unrolled x.xi in [0, 2]: 

12 blur x(...) =... 

13 consume blur x: 

14 for x.xo in [0, 511]: 

15 unrolled x.xi in [0, 2]: 

16 blur_y(...) =... 


erates read and write permissions and preconditions for functions used in defini- 
tions. This generation of permissions makes it possible to keep the annotations of 
HALIDE algorithms concise, since the user does not have to specify permissions. 
Second, HALIVER transforms the annotations and adds them to the interme- 
diate representation used by the HALIDE compiler. Finally, HALIVER adds the 
annotations to the code, during the code generation of the HALIDE compiler. 


Annotation Generation. Since HALIDE algorithms consist of pure point-wise 
functions, permissions are relatively straightforward: for a function f(x,...), 
HALIVER generates the write permission Perm(f(x,...),1\1). For the blur ex- 
ample from Listing |1} HALIVER generates blur_x.context (Perm(blur_x(x,y 
), 1\1) and blur_y.context (Perm(blur_y(x,y), 1\1) for function blur x 
and blur. y, respectively. 

For update functions and reductions, HALIVER generates (1) read permis- 
sions for function values that are not being updated, and (2) a pre-state predi- 
cate, using the post-state predicate of the previous update step. 

Once a function is fully defined, read permission is given to all values wherever 
the function is used, along with a context predicate containing any intermediate 
annotations of the function. 


Transformation of Annotations. Next, HALIVER transforms the annota- 
tions according to the schedule given by the user and associates them with the 
corresponding parts of the optimised HALIDE program expressed in HALIDE's 
intermediate language. 

HALIVER supports the split, fuse, parallel, unroll, store at, reorder 
and compute, at scheduling directives. Of the most commonly used directives in 
the HALIDE example appa) only vectorize is not supported because VERCORS 
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does not yet support verification of vectorised code as produced by Haul 
With these directives, HALIVER provides the means to verify optimised programs 
w.r.t. memory locality, parallelism and recomputation. This is the optimisation 
space in which HALIDE resides (22). We illustrate the meaning of these directives 
with an example. Listing [5] shows a schedule for blur on lines 1-2, and below 
that the loop nest structure of the resulting program. Loop nests are program 
statements of nested for loops. The loops can be sequentially executed or be 
parallelized, unrolled or vectorized. The allocation of space for a function result 
is indicated by store, and produce and consume refer to writing and reading 
function results, respectively. This loop nesting corresponds to the actual code 
produced by the HALIDE compiler. 

Assuming that the output dimensions in the example are both of size 1,024, 
the directive split(y, yo, yi, 8) (line 1 of Listing |5) splits the dimension 
y into two nested dimensions y.yo (line 5) and y.yi (line 7) of sizes 128 and 
8, respectively. HALIVER similarly renames references to y in annotations. The 
parallel(yo) directive (line 1) expresses that y . yo should be executed in paral- 
lel (line 5). The store. at(blur y, yo) directive (line 2) expresses that blur. x 
must be stored at the start of the y. yo loop (line 6). The directive compute, at 
(blur y, yi) (line 2) defines that the values for blur. x should be produced 
at y.yi (line 8). The directive unroll(xi) (line 1 and 2) expresses that the 
dimension xi should be completely unrolled. 

The for loops are sequential. In this example, fuse and reorder are not 
used; they express that two dimensions should be fused into one and the nesting 
order of the loops should be changed, respectively. 

HALIVER moves bottom-up through the program, constructing loop invari- 
ants for each loop by taking the constructed state predicates from the loop body 
and extending them with quantifications over the loop variables. Below, we give 
an example of this exact process for the blur example of Listing 


Encoding of HALIDE Program. Finally, HALIVER adds annotations to the 
C code during the code generation of the HALIDE compiler. As an example 
we show how HALIVER adds annotations of the blur. y function of Listing 
with the schedule of Listing |5| The result of this can be found in Listing |6| It 
shows the structure of the whole program, but is focussed on the code below the 
consume blur. x node (line 13 of Listing b]. 

First, HALIVER updates its pipeline annotations (lines 1-2 of Listing i, to 
match the flattened array structure the HALIDE back-end uses, and adds them 
to the function contract (lines 8-15 of Listing 6). HALIVER also uses the HALIDE 
definition of division (hdiv), i.e., Euclidean] with z/0 = 0. 


6 The vectorize scheduling directive is the same as the unroll directive from the per- 
spective of transforming annotations. So they can be treated exactly the same and 
already are in HaliVer. 

T For the interested reader, we explain the approach in a more general way in Appendix 
C of the version available at |https:/ /arxiv.org/abs/2401.10778 

8 The HALIDE compiler uses bit operators to define euclidean division. However, bit 
operators are not supported in VERCORS, so HALIVER uses an equivalent definition. 
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Listing 6. The C code and annotations the HALIDE compiler produces together with 
HALIVER for the function blur_y, focussing on the consume blur_x node (see line 13 
of Listing []. The complete encoding for the blur, y pipeline is available in Appendix 


B of the version available at |https:/ /arxiv.org/abs/2401.10778 
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struct halide_dimension_t {int32_t min, max;}; 
struct buffer {int32_t dimensions;struct halide dimension t *dim;int32_t *host;}; 
int div_eucl(int x, int y); 
//@ pure int hdiv(int x, int y) = y = 0 ? 0: div_eucl(x, y); 
//@ pure int p i(int x); 
/*0 ... // Buffers annotations 
context (V int x,int y;0<xAx<1026A0<yAy<1026; inpb—host [y*1026+x] =p_i(y*1026+x) ) ; 
// Pipeline preconditions 
requires inpb—dim[0].min-cblur yb—dim[0].min^ inpb—dim[0].max-blur yb—dim[0].max*2; 
requires inpb—dim[1].min-cblur yb—dim[1].min^ inpb—dim[1].max-blur yb—dim[1].max*2; 
// Pipeline postconditions 
ensures (V int x, int y; 0<xAx<1024A0<y& y<1024; blur yb—host[y*1024«x] = hdiv( 
hdiv(inpb—host [y*1026+x+1027]+inpb—host [y*1026+x+1028]+inpb—host [y*1026+x+1026] ,3) 
hdiv(inpb—host [y*1026+x+2053]+inpb—host [y*1026+x+2054]+inpb—host [y*1026+x+2052] ,3)+ 
hdiv(inpb—host [y*1026+x+1]+inpb—host [y*1026+x+2]+inpb—host [y*1026+x] ,3) ,3)) ;@*/ 
int blur 3(struct buffer *inpb, struct buffer *blur yb) { 
int32 t* blur y - blur yb—host; 
int32 t* inp - inpb—host; 
// produce blur y 
#pragma omp parallel for 
for (int yo = 0; yo<0 + 128; yot++) 
... // Annotations blur y.y.yo 
al 
int64_t _2 = 10240; 
int32 t *blur_x = (int32 t *)malloc(sizeof(int32 t )* 2); 
int32 t _t11 = (yo * 8); 
// Annotations blur y.y.yi 
for (int yi —-0: yi<0 8; yiti) 
{... // produce blur. x 
// consume blur x 
int32 t _t16 = (yi + _t11) * 512; 
3nt32-t -t16 yi * 512; 
/*@ loop invariant OXxo ^ xo<0 + 512; 
loop invariant (V* int x, int y; OXx A x«1024 A yo*8<y ^ y«yo*8 + 10; 
Perm(&blur.x[(y-yo*8)*10244x], 1\2)); 
loop invariant (V int xo, int y; 0<xo A x0<1024 A yo*8tyicXy ^ y<yox8+yit2; 
blur x[(y-yo*8)*10244xo] = hdiv(p i(y*1026*xo) + p i(y*10264xo*1) + p i(y*10264xo 
3:2)53)915 
loop invariant (V* int rif. int xof; OXxof A xof<512 A OXxif ^ xif«2; 
Perm(&blur_y[(yo*8+yi)*1024+xof*2+xif], 1\1)); 
loop_invariant (V int xof, int xif; OXxof A xof<xo A OXxif A xif<2; blur_y[( 
yorStyi) *1024+x0f*2+xif] = 
hdiv(hdiv(p_i((yo*8+tyi) *1026+xo0f*2+xif) + p_i((yo*8+yi) *1026+xof*2+xif+1) + p_i(( 
yo*8tyi)*1026*xof*24xif42), 3) + 
hdiv(p_i((yo*8+yi) *1026+xof*2+xif+1026) + p_i((yox8+yi) *1026+xof*2+xif+1027) + p.i 
((yo*8+yi) *1026+x0f*2+xif+1028), 3) + 
hdiv(p i((yo*8*yi)*1026*xof*24xif*2052) + p_i((yox8+yi) *1026+xof*2+xif+2053) + p.i 
((yo*8tyi)*1026*xof*2*xif*2054), 3), 3)); @*/ 
for (int xo = 0; xo<0 + 512; xot+) 
{ 
int32_t _t9 = (xo + _t15); 
blur_y[(xo + _t16) * 2] = div_eucl(blur_x[_t9 * 2] + blur x[ t9 * 2 + 1024] + 
blur_x[_t9 * 2 + 2048], 3); 
blur_y[(xo + _t16) * 2 + 1] = div_eucl(blur_x[_t9 * 2 + 1] + blur_x[_t9 * 2 + 1025] 
+ blur_x[_t9 * 2 + 2049], 3); 
Jo / for xo 
af for yi 
free(blur x); 
) // for yo 
return 0;) 
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Next, HALIVER transforms the annotations added to the blur_y function, 
before it adds them to any loop nest. The HALIDE compiler flattens the two- 
dimensional function blur_y(x,y) into a one-dimensional array blur. y [y*102 
4+ x], so HALIVER does the same for all function references in the annotations. 
Next, from the schedule, the directive split(x, xo, xi, 2) splits x into xo 
and xi of sizes 512 and 2, respectively. A similar split is performed for y. The 
generated annotation context (Perm(blur y(x,y), 1\1)) becomes context 

Perm(&blur_y[(yo*8+yi)+ xo*2+ xi], 1\1)). 

For the annotation ensures (blur_y(x,y)==(((inp(x,y)+..., HALIVER 
replaces the calls to inp(x,y) with calls to an abstract pure function p_i. This 
is done because quantification instantiation in VERCORS can become unstable if 
inp is used frequently. Where inp is used in the code, HALIVER adds annotations 
stating that inp and p_i have the same value (line 7 of Listing [6}. 

HALIVER adds these annotations to the first loop nest, starting bottom up. 
In Listing [5] this is xi, but since this loop is unrolled, additional annotations are 
not needed. After passing this loop nest, anything for xi-0 and xi=1 now holds. 
HALIVER changes the annotations by quantifying over xi's domain. It uses xif 
as variable and changes any references to xi towards xif. The resulting permis- 
sions are (Vxif; O<xif Axif«2; Perm(blur y[(yo*8*yi)* xo*2*xif], 1\ 
1)). The other annotations are processed in a similar way. 

Next, HALIVER arrives at the loop nest for xo, which needs loop invariants. 
First, the tool adds the bounds of the xo dimension (line 33 of Listing [6). The 
annotation is transformed depending on whether it was a requires, ensures 
or context annotation. The write permission (context), should hold before the 
loop starts and after the loop ends. Therefore, HALIVER adds the permission, 
but quantifies over dimension xo, which results in a loop invariant (lines 38-39 
of Listing [6]. The ensure annotation does not hold at the start of the loop, but 
after each iteration of the loop, one more value for xo holds. Therefore, HALIVER 
quantifies over xof bounded by zero and the iteration variable xo, and replaces 
occurrences to xo with xof, which leads to a loop invariant (lines 40-43 of 
Listing 6). For loops above this loop nest, the ensure annotations hold for the 
whole domain of xo, resulting in ensures (Vxof, xif; O<xof Axof<512A0 
Xxif Axif«2; blur y[(yo*8*yi)*1024*-xof*2*xif] =.... This annotation 
is added to the parallel for loop. 

After constructing the produce node for blur. y, the produce node for 
blur. x is constructed in a similar way. The bound inferencer of HALIDE de- 
tects it only needs to calculate for y values of 8«yO*yi up to 8*yO*yi*2. The 
annotations are transformed respecting that fact. After the produce node, the 
blur. x is consumed (line 30 of Listing (6). So for each loop below the consume 
statement, HALIVER adds read permission (lines 34-35 of Listing lob) and the 
post-state predicate of blur, x (lines 36-37 of Listing [6] as context annotations. 
For the loop of xo, this means they are valid for any value of xo. 


Guarantees. With the back-end verification approach, HALIVER can prove that 
the optimised code produced by the HALIDE compiler is correct w.r.t. specifica- 
tions. Memory safety is proven without any additional effort, as the permission 
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Table 1. Number of lines of code and annotations for different HALIDE algorithms, 
schedules and resulting programs, and the verification times required by VERCORS 
to prove memory safety, given that no annotations are provided by the user. The 
letters after each schedule denote the used directives: compute_at, fuse, parallel, 
reorder, split, store at and unroll. F stands for verification failed. Times with ! 
are inconsistent, i.e. they are succesfully verified, but can also sometimes fail or timeout. 


Name HALIDE|Sched.|C 
LoC Dir.|LoC LoA. Loops T. (s). 
blur VO 38 0| 178 60 2 18 
Vi1-{f,p} " 2| 172 56 1 19 
V2-lc,p.r,s) " 6| 212 74 6 29 
V3-{c,p,5,st,u} " 8| 211 72 5 24 
hist VO T 2| 299 98 11 30 
Vi-{e,p,r,uy " 4| 308 99 Ti 38 
V2-{c,p,r,u} " 6| 311 105 13 48 
V3-{¢,p,r,u} " 13| 312 101 13 48 
conv VO 44 0| 273 148 T 90 
layer V1-{c,f,p,u} " 4| 281 145 8 97 
V2-lpr,s,u) " 6| 302 166 10 209 
V3-lc;p.r,s,u] " 15| 279 148 7 168 
gemm VO 70 0| 218 105 3 41 
VI-[e,;p,r,s) " 8| 274 136 10 89 
V2-(c,p,r,s) " 16| 342 173 19 1961 
V3-le,f,p,r,s,u] " 24| 451 221 31 F 
auto VO 112 0| 443 118 19 35 
viz VI1-(c]) E 9| 402 139 23 180 
V2-{c,p} Y 12| 440 156 27 170 
V3-lc,p.r,s) " 27| 443 152 25 105 
camera pipe-(c,p,r,s,st] 345 27| 701 236 25 F 
bilateral grid-[c,p,r,uJ 88 18| 562 180 39 140 
depthwise separable conv-(c.p,r,s] 94 13| 562 315 44 480 


annotations for this are generated automatically. For functional correctness, a 
specification needs to be provided. For any non-inlined function, an intermediate 
annotation is required to guide VERCORS in correct functional verification. 

The approach is sound, but not necessarily complete. One concern is that, 
since we have not formally proved the correctness of the transformation, our 
implementation could in principle be wrong. HALIVER addresses this by keeping 
the pipeline annotations very close to what the user has written as annotations. 
These pipeline annotations act as the formal contract that will be verified, and 
the user can inspect these at any time. If an intermediate annotation is not 
correctly transformed, the verification will fail, thus remaining sound but not 
complete. Of course we have not constructed any transformations to be wrong, 
but even if there is an oversight, we will remain sound. Moreover, in Section 
we show that our approach works for real world examples. 


4 Evaluation 


The goal of the evaluation of HALIVER is four-fold. (1) We evaluate that the 
front-end verification approach of HALIVER can verify functional correctness 
properties for a representative set of HALIDE algorithms. (2) For the back-end 
verification approach, the annotations that HALIVER generates and transforms 
should lead to successful verification for a representative set of HALIDE programs, 
with schedules that use the most important scheduling directives in different 
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Table 2. Number of lines of code and annotations for different HALIDE algorithms, 
schedules and resulting programs, and the verification times required by VERCORs. 


Name HALIDE Front-end|Sched.|C LoA 
LoC LoA T. (s)| LoC|LoC LoA Loops T. (s)|| incr. 

blur VO 38 2 8 0| 178 63 2 21||31.5x 
V1-{f,p} W 2| 172 58 ET 23||29.0x 
V2-{c,p,r,s} T " T 6| 212 83 6 52||41.5x 
V3-{c,p,s,st,u} " " " 8| 211 79 5 971 [||39.5x 

hist VO 71 10 8 0| 299 118 TT 34||11.8x 
V1-{c,p,r,u} T " T 4| 308 118 11 47||11.8x 
V2-{c,p,r,u} T " T 6| 311 123 13 56|[12.3x 
V3-{c,p,r,u} " " T 13| 312 125 13 64|[12.5x 
conv VO 44 T. 8 0| 273 177 7 111||25.3x 
layer VI-(c,f,p,u T " T 4| 281 174 8  108||24.9x 
V2-{p,r,s,u} ? " Li 6| 302 204 10 283||29.1x 
V3-{c,p,r,8,u} T " T 15| 279 177 7  207||25.3x 
gemm VO 70 12 7 0| 218 120 3 43||10.0x 
VI1-(c,p.r,s T " T 8| 274 169 10  133||4.Ix 
V2-ic,p,r,s T " T 16| 342 230 19  368||19.2x 
V3-{c,f,p,r,s,u} T " T 24| 451 310 31 F||25.8x 
auto_ VO 112 15 8 0| 443 158 19 1521 |/10.5x 
viz Vi1-{c} U y L 9| 402 210 23 216||14.0x 
V2-{c,p} " D L 12| 440 235 27 230 15.7x 
V3-(c,p,r,s " " " 27| 443 229 25 192! ||15.3x 


combinations. (3) We evaluate the verification speed for front-end and back- 
end verification. (4) Lastly, we evaluate how many annotations are e in 
HALIVER compared to manually annotating the generated C programs?? 


Set-up. We used a machine with an 11th Gen Intel(R) Core(TM) i7-11800H @ 
2.30GHz with 32GB running Ubuntu 23.04. 

We used eight characteristic programs from the HALIDE repository 14 These 
are representative HALIDE algorithm examples. They cover all scheduling direc- 
tives supported by HALIVER, in commonly-used combinations. We removed any 
scheduling directives that we do not support. VERCORS is unable to deal with 
large dimensions that are unrolled, thus we removed some unro11 directives as 
well H 

The original schedule, as found in the HALIDE repository, is indicated with 
V3 if there are multiple schedules present. For five of these programs we defined 
annotations that express functional properties. These five programs are also 
evaluated with the standard schedule (V0), which tries to inline functions as 
much as possible, and two additional schedules (V1 and V2) we constructed. 


Memory Safety Results. We evaluate 8 HALIDE programs, with in total 23 
schedules, and prove data race freedom and memory safety for 21 of them. No 
user provided annotations are needed. The results can be found in Table 

For each case, we provide: the number of lines of code (LoC)^4lfor the HALIDE 
algorithm, without the schedule and number of scheduling directives (Sched. 


? The experiments can be found at https: github.com/sakehl/HaliVerExperiments 
https:/ /github.com/halide/Halide/tree/main/apps|gemm is part of linear algebra. 


or the interested reader, we explain this further in Appendix A of the version 
available at https: / /arxiv.org/abs/2401.10778 
1? These lines are counted automatically and indicate the size of the programs. 
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Dir.). For the generated programs (C) we list: lines of code (LoC), lines of anno- 
tations (LoA.), number of (parallel) loops (Loops). These numbers indicate how 
large programs tend to become w.r.t. HALIDE algorithms, and how much anno- 
tation effort would be required to manually annotate the programs. Verification 
running times (T. (s)) are given in seconds, averaged over five runs. 

For camera_pipe, VERCORS gives a verification failure. It could not prove a 
loop_invariant, but after simplifying parts of the generated C program not re- 
lated to this specific invariant, it leads to a successful verification. This indicates 
that the program is too complex for the underlying solvers. We also coded this 
example in similar PvL code instead of C, which verifies in 193s. We suspect 
the failure is caused by quantifier instantiation, which instantiates too many 
quantifiers, resulting in the SMT solver on which VERCORS relies stopping the 
exploration of quantifiers that are needed for successful verification. 

For gemm V3, verification fails due to VERCORS not sufficiently rewriting 
annotations of the fuse directive] 


Functional correctness results. Next, we evaluate fivd4] algorithms with 
annotations and 20 schedules, both for the front-end and back-end. HALIVER 
proves functional correctness for the front-end, and both functional correctness 
and data race freedom and memory safety for the back-end for 19 of the 20 
schedules. These results are given in Table |2| The table additionally has the 
amount of user provided annotations (LoA.) and the last column (Ann. incr.) 
indicates the growth of the annotations. The annotations of the C file (LoC) 
contain both the generated annotations, which are already present in Table 
and the transformed user annotations. 

For optimised programs, the annotation size is strongly related to the number 
of loops, as each loop needs its own loop invariants. Front-end verification is suc- 
cessful for all examples and is relatively fast compared to back-end verification. 
In verification of the C files produced by the back-end verification approach, 
time increases as the number of scheduling directives increases. Here, gemm V3 
also fails for the same reason as outlined above. 


Inconsistent Results. For gemm V2 for the memory benchmarks and for blur 
V3 and auto, viz VO, V2 and V3, VERCORS does not always succeed with the 
verification. In the case of gemm V2, the verification sometimes hangs, which 
is timed out after 10 minutes. In the other cases, VERCORS sometimes gave a 
verification failure. T'his inconsistency is due to the non-deterministic nature of 
the underlying SMT solvers. 


Conclusions. With the front-end verification approach of HALIVER we are able 
to prove functional correctness properties for representative HALIDE algorithms. 


15 For the interested reader. we explain this further in Appendix A of the version 
available at |https: //arxiv.org/abs/2401.10778 
14 The other three algorithms from the memory safety results are typical image pro- 


cessing pipelines. T'hey are therefore less suitable for checking functional correctness 
and are not used here. 
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Using HALIVER’s back-end verification approach, the tool provides correct an- 
notations for the generated C programs. VERCORS successfully verifies all but 
two programs. However, in the unsuccessful cases, HALIVER runs into limita- 
tions of the underlying tools. The verified programs are all verified within ten 
minutes. Finally, the manual annotation effort required is an order of magnitude 
larger than the effort required for HALIVER's approach. 


5 Related Work 


'There is much work on optimising program transformations, either applied au- 
tomatically or manually "mi sometimes using scheduling languages 
[23)[28}. The vast majority of this does not address functional correctness. 

Work on functional correctness consists of techniques that apply verification 
every time a program is transformed, and techniques that verify the compiler. 

Liu et al. propose an approach inspired by scheduling languages, with 
proof obligations generated when a program is optimised, for automatic verifi- 
cation using Coq. The COGENT language uses refinement proofs, to be ver- 
ified in ISABELLE/HOL. However, it does not separate algorithms from sched- 
ules. In an integer constraint solver and a proof checker are used, re- 
spectively, to verify the transformation of a program. In all these approaches, 
semantics-preservation is the focus, as opposed to specifying the intended be- 
haviour. Model-to-model transformations can be verified w.r.t. the preservation 
of functional properties (21). However, that work targets models, not code. 

Regarding the verification of compilers, COMPCERT is a framework in- 
volving a formally verified C compiler. In [19]. HALIDE’s Term Rewriting System, 
used to reason about the applicability of schedules, is verified using Z3 and Coq. 
These approaches do not require verification every time an optimisation is ap- 
plied, but verifying the compiler is time-consuming and complex, and has to be 
redone whenever the compiler is updated. Furthermore, they focus on semantics- 
preservation, not the intended behaviour of individual programs. 

ALPINIST is most closely related. This tool automatically optimises PVL 
code, along with its annotations, for verification with VERCoRS. It allows the 
specification of intended behaviour, but it does not separate algorithms from 
schedules, forcing the user to reason about the technical details of parallelisation. 


6 Conclusions & Future Work 


We presented HALIVER, a tool for verifying optimised code by exploiting the 
strengths of scheduling languages and deductive verification. It allows focussing 
on functionality when annotating programs, keeping annotations succinct. 

For future work, we want to extend the HALIVER tool with aspects not 
directly supported by VERCORS, such as vectorisation. The master thesis of 
defines a natural semantics for HALIDE. We want to formalise our front-end PVL 
encoding with an axiomatic semantics to match this semantics. We also want to 
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investigate the inconsistent results and see whether annotations with quantifiers 
can be rephrased to allow VERCORS to be more consistent. In this work we 
have focussed on parallel CPU code, but we have designed our approach to be 
extendable to GPU code produced by HALIDE. 

With the current expressiveness of the annotations, when reduction domains 
are present, HALIVER proves functional correctness for specific inputs. For ex- 
ample, in Listing P] we can prove that count (x)==9 if we require that input 
(x,y)—-x. This can also be done for any input if the reduction domain is of 
known size, but then many annotations are needed. To make the annotations 
concise, a user needs to be able to use axiomatic data typedi*] and pure functions 
in their annotations. We expect that these annotations can be similarly trans- 
formed by our approach, and that is thus orthogonal to this contribution, but 
this is planned as future work. 

Most HALIDE programs use floating point numbers. These are currently mod- 
elled as reals in VERCORS. How to efficiently verify programs with floats using 
deductive verifiers is still an open research question. Once this is addressed, 
HALIVER will be able to give better guarantees. 

We require that the bounds of a HALIDE program are set to concrete values 
for our back-end verification approach. HALIVER transforms the annotations the 
same way for not know bounds, but the underlying tools have difficulty verifying 
these programs. With unknown bounds, we end up with nonlinear arithmetic 
due to the flattening of multi-dimensional functions on one-dimensional arrays. 
'This is generally undecidable, so the SMT solvers that VERCORS rely on cannot 
handle it. We will investigate if there are ways to tackle this in our domain- 
specific case. 
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Abstract. We present a gray-box fuzzing approach based on several 
new ideas. While standard gray-box fuzzing aims to cover all branches 
of the input program, our approach primarily aims to cover both results 
of each Boolean expression. To achieve this goal, we track the distances 
to flipping these results and we dynamically detect the input bytes that 
influence the distance. Then we use this information to efficiently flip the 
results. More precisely, we apply gradient descent on the detected bytes 
or we create new inputs by using detected bytes from different inputs. 
We implemented our approach in a tool called Fizzer. An evaluation 
on the benchmarks of Test-Comp 2023 shows that FIZZER is fully com- 
petitive with the winning tools of the competition, which use advanced 
formal methods like symbolic execution or bounded model checking, usu- 
ally in combination with fuzzing. 


1 Introduction 


Fuzzing is a technique for automated generation of test inputs for a given pro- 
gram. The goal of fuzzing is to generate tests with high code coverage and to 
quickly detect bugs in the code. We distinguish three basic kinds of fuzzing 
based on their use of the given program. Black-box fuzzing [18] only runs the 
given program on various inputs and observes the outputs. Gray-box fuzzing [18] 
first instruments the program to get some information about performed execu- 
tions. The instrumented code typically tracks the information about the basic 
blocks visited during the execution. While black-box and gray-box fuzzing rely 
on dynamic analysis of the original or instrumented code, white-boz fuzzing [18] 
combines dynamic analysis with some static analysis of the code, typically con- 
colic execution, symbolic execution, or bounded model checking. 

Black-box fuzzers have only limited efficiency due to the lack of information. 
Gray-box fuzzers and white-box fuzzers proved to be very efficient and they 
are routinely applied in software industry. For example, the gray-box fuzzer 
AFL [27] discovered dozens of bugs in many recognized open-source projects 
and the white-box fuzzer SAGE [11] is intensively used in Microsoft. 

The standard approach of successful gray-box fuzzers is to collect only a 
very limited information about each program execution and to quickly perform 
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as many executions as possible. In this paper we suggest an approach that gath- 
ers slightly more information about program executions and uses it to select 
uncovered parts of the code and make more targeted attempts to cover it. We 
can illustrate some ideas of our approach on a simple example. Consider a pro- 
gram that contains a branching statement if (x > 42) and assume that some 
program execution passed its true branch. During this program execution, we 
saved the value of x - 42 to know the distance to entering the false branch. 
When we decide to cover the false branch, we first repeatedly execute the pro- 
gram on modified inputs to detect the bytes of the input that have some influence 
on the distance value. This is called a sensitivity analysis and the detected bytes 
are called sensitive. We then propose two analyses that use the sensitive bytes to 
cover the uncovered branch. One analysis performs a dynamic gradient descent 
on the sensitive bytes with the aim to minimize the absolute value of the distance 
and to enter the false branch. Alternatively, if we already know another input 
that entered the false branch of this statement in a different calling context, 
we can try to use the value of its sensitive bytes instead of the sensitive bytes 
of the current input. This analysis is called byteshare analysis. Now consider a 
slightly different program where the branching statement has the form if (res) 
where res is a Boolean variable assigned before by res = x > 42. Clearly, we 
want to track the distance to changing the value of res. Hence, we in fact do not 
track distances for branching conditions, but the distances for values of atomic 
Boolean expressions. Roughly speaking, our approach aims to generate tests 
such that each atomic Boolean expression in each calling context is evaluated 
to true and to false in some program executions. Our fuzzing approach tracks 
its progress with the use of atomic Boolean execution tree and we talk about 
Boolean expression coverage. 


The following section introduces the basic terminology used in the paper and 
states our assumptions on the analysed programs. Section 3 then describes the 
basic concepts of our approach, in particular the Boolean expression coverage, 
the information we collect from each program execution and how we obtain this 
information, the atomic Boolean execution tree, and the fuzzing algorithm. This 
algorithm iteratively tries to close vertices of the tree by generating inputs in 
which each of the vertices evaluates both to true and to false in order to 
either increase the Boolean expression coverage or to discover new parts of the 
tree. These inputs are generated by sensitivity analysis, byteshare analysis, and 
gradient descent analysis presented in Section 4. The selection strategy of the 
vertex to be closed is briefly explained in Section 5. Note that the page limit 
does not allow describing all the technical details of the approach. They can be 
found in the corresponding technical report [15]. 


We have implemented the presented fuzzing approach in a tool called FIZZER. 
The architecture and some implementation aspects of the tool are described in 
Section 6. Further, we have run FIZZER on all benchmarks of the Cover-Branches 
category of the Competition on Software Testing (Test-Comp) 2023 |5]. We eval- 
uated the tests generated by FIZZER using the competition infrastructure which 
measures the achieved branch coverage. The results presented in Section 7 show 
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that our tool is competitive with the top-ranking tools of Test-Comp 2023, 
namely FUSEBMC [2], VERIFuzz [21], and COVERITEST [6]. Note that our 
tool is a pure gray-box fuzzer while FUSEBMC and VERIFUZZ combine dy- 
namic analysis with static analyses like symbolic execution and bounded model 
checking. COVERITEST fully relies on static methods like predicate analysis with 
the CEGAR loop and value analysis. Finally, Section 8 discusses some related 
work and Section 9 sums up the presented results and outlines future work. 


2 Preliminaries 


The ideas presented in this paper can be adopted for various kinds of programs. 
For ease of presentation, here we consider sequential C programs that get input 
only via functions nondet_char(), nondet_int(), and nondet_float() which 
return values of the corresponding type. 

For simplicity, we assume that these are the only types that can be read from 
the input and we define the set InputTypes = {char, int, float}. We define the 
set of typed values Typed Values = {(v,t) | t € InputTypes, v is a value of type t} 
and denote the pairs (v,t) € TypedValues as v : t, e.g., 3: int is the value 3 of 
type int. We also work with untyped inputs, which are arbitrary finite sequences 
of bits 0 and 1. Untyped inputs are denoted by a standard language-theoretic 
notation, e.g., 1!? is a sequence of 12 elements 1. 

An expression occurring in a program is called an atomic Boolean expression 
(ABE) if it has type bool and it is not a variable, not a call of a function whose 
definition is a part of the program, and not a result of applying logical operators, 
i.e., conjunction, disjunction, and negation. For example, the expression (x > 3) 
&& foo(x,y) && cond, where foo is a function defined in the program and cond 
is a variable, contains only one ABE x > 3. By ABE we always mean a particular 
occurrence of the expression in the program. 

We assume that the control flow is fully determined by the values of ABEs. 
This property may not hold for programs with switch statements, function 
calls via input-dependent function pointers, etc. However, such programs can be 
transformed into equivalent ones satisfying our assumption. 

By a calling context we mean the sequence of function calls that are currently 
being evaluated. The outermost function call is the first element of the sequence 
and the last one is the function whose body is executed at the moment. In other 
words, the calling context roughly corresponds to the call stack. 

We sometimes denote a sequence £182... Ln as (21, L2,- .., En} OF (Xi) 1<i<n- 


3 Overview of Our Fuzzing Approach 


This section provides an overview of the key concepts that are used in our fuzzing 
algorithm and presents the high-level view of the algorithm. The key heuristics 
for input generation are explained later in Sections 4 and 5. 
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void main() { void main() { bool compare(int v) { 
int x - nondet int(); int x = nondet int(); return v « 42; 
if (x « 42) { bool resi = x « 42; Y 
// branch 1 xtt; 
) else { bool res2 = x < 42; void main() 1 
// branch 2 if (resi || res2) ( int x - nondet int(); 
Y // branch 1 bool resi = compare(x); 
Y ) else { xtt; 
// branch 2 bool res2 - compare(x); 
rae } if (resi || res2) { 
(a) Trivial case. } // branch i 
) else { 
branch 2 
(b) Depends on a non- } te 
local comparison. } 


(c) Depends on a comparison 
coming from a different scope. 


Listing 1.1: Example C codes showing that the values that influence which branch 
is taken can be both lexically far away from the branching statement and can 
be behind several layers of indirection. 


3.1 Branch Coverage via Boolean Expression Coverage 


The main idea of the proposed approach is to assign to each executed branching 
statement a metric called distance reflecting how far the current program state is 
from evaluating the branching expression to the opposite Boolean value. Thanks 
to this metric, we can use gradient descent to generate inputs that either flip the 
Boolean value or are close enough to the flipping point so that the actual flip 
can be achieved by small mutations of the input. 

It is easy to define the distance for branchings like if (x > 42): we set the 
distance to x—42 and minimize the absolute value |~—42| to get close to the point 
where the result of the branching expression changes. However, as Listing 1.1 
shows, the situation can be far more complex. The comparison does not have 
to occur in the branching expression itself, but it can be precomputed earlier in 
the program, it can come from a function call or be read from an array, etc. 

We sidestep this issue by assigning the distances to atomic Boolean expres- 
sions and trying to flip their values rather than doing the same for branch- 
ing expressions. In other words, we approach the goal of generating tests with 
maximal branch coverage indirectly by maximizing Boolean expression coverage. 
Intuitively, we try to generate a set of inputs such that every atomic Boolean 
expression evaluates to true on some input and to false on some input. In fact, 
we want to generate inputs leading to both Boolean values of each ABE in each 
possible calling context. The importance of the calling context is illustrated on 
the ABE v < 42 in the code of Listing 1.1c: we clearly want to distinguish the 
case when the value of v < 42 is used to set the value of resi from the case 
when it is used to set the value of res2. The precise goal of our approach will 
be formulated later using the terms atomic Boolean execution tree and covered 
vertex of the tree introduced in Definitions 1 and 2, respectively. 
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Every time an ABE e is evaluated, its distance is computed by the expression 


dist(e) value(l) — value(r), ife-—ilbrar where ra € {=,4,<,<,>,>}, 
ist(e) — 
value(e), otherwise. 


In the first case, the value(l) and value(r) refer to the numerical values of | and 
r, respectively, before the evaluation of e. In the second case, value(e) is defined 
as 1 if e evaluates to true and it is defined as 0 if e evaluates to false. 

Note that the branch coverage and the atomic Boolean expression coverage 
do not precisely match. For example, we can achieve the full branch coverage of 
the code in Listing 1.1b by two tests; with input values 40 : int and 41 : int. 
However, the true branch of the first ABE x « 42 is not covered by either of 
these tests. Nevertheless, our experimental evaluation shows that maximizing the 
atomic Boolean expression coverage also leads to test inputs with high branch 
coverage. 


3.2 Instrumentation and Execution 


From each program execution, our approach needs to get the sequence of eval- 
uated ABEs including their calling contexts, their Boolean values, and their dis- 
tances. To obtain this information, the program is instrumented with the follow- 
ing functions: 


— To track the calling context, we assign a unique identifier id to each function 
call (except nondet_* function calls) and insert | instr. call(id) before 
the call and __instr_return() after the call. The inserted function calls 
maintain the current stack of open function calls. 

— To track all evaluated ABEs and their values, distances, and calling con- 
texts, we assign a unique identifier id to each ABE e and insert the call 
__instr_abe(id, e, dist(e)) before the ABE. The calling context is inter- 
nally retrieved from the tracked stack of open function calls. 


Listing 1.2 provides the instrumented programs from Listings 1.1b and 1.1c. 

Besides the inserted function calls, we also alter the functions nondet. type () 
to collect the information about the values and types read from the input stream 
and when they were read. 

In the following, we assume that there exists a function execute( P', input) 
that gets an instrumented program P’ and an untyped input input € {0,1}* and 
returns the trace of the execution of P' on input.0”, i.e., input extended with 
infinitely many zero bits. The trace is a pair (usedInput, 7) where 


— usedInput is the sequence of Typed Values that were read by the program P’ 
during the execution. 

— m is the sequence ((ei, ci, ri, di, ni))1«i«x of tuples, where each tuple repre- 
sents one evaluation of an ABE: e; is the evaluated ABE, c; is the calling 
context in which it was evaluated, r; is the result of the evaluation, d; is the 
corresponding value of dist(e;), and n; is the number of bytes of the input 
that have been read before the evaluation. 
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void main() { bool compare(int v) { 
int x = nondet_int(); ..instr aboli, v « 42, v - 42); 
..instr abeli, x < 42, x - 42); return v « 42; 
bool resi - x « 42; } 
xtt; 
scimsts.ube(s, x € 42, x =< 42)j void main() { 
bool res2 - x « 42; int x = nondet int(); 
if (resi || res2) { ..instr call (i); 
// branch 1 bool resi = compare(x); 
} else { ..instr retürha(); 
// branch 2 xtt; 
F -instr call (2): 
} bool res2 = compare (x); 
..instr return(); 
(a) Instrumentation of Listing 1.1b a ee es $ 
} else { 
// branch 2 
} 
} 


(b) Instrumentation of Listing 1.1c 


Listing 1.2: Instrumented programs from Listings 1.1b and 1.1c 


Note that the trace is always finite as P’ is executed with some limits on the 
number of evaluated ABES. 


Example 1. Let P' be the instrumented program from Listing 1.2b. The function 
execute(P’, 08?) returns the trace ((0 : int), 7), where 


m = ((v < 42, (1), true, —42, 4), (v < 42, (2), true, —41, 4)). 


In other words, the execution read only a single int of value 0 from the input 
and these 4 bytes were read before the first ABE evaluation. Further, the ABE 
v « 42 with identifier 1 (for readability denoted directly by the expression) has 
been evaluated twice: once in the calling context (1), value true, and distance 
—42 and later with the calling context (2), value true, and distance —41. 


3.3 Atomic Boolean Execution Tree 


Each execution trace (usedInput, ((ei, ci, ri, di, Ni)) 1<i<k) determines the sequence 
TiT2...Ty Of ABE values. Our fuzzing approach tracks the information about all 
such sequences seen so far by maintaining an atomic Boolean execution tree. 


Definition 1 (atomic Boolean execution tree, ABET). An atomic Boolean 
execution tree (ABET) is a nonempty prefix-closed finite set T C (true, false}*. 
Elements of T are vertices, € is the root, and elements v.true, v.£alse are chil- 
dren of v. We assume that each vertex is either a leaf or it has two children, i.e., 
for each v € T it holds v.true € T <= > v.false c T. 


Our method starts with the tree T = {e}. Whenever we obtain a trace 
(usedInput, ((€;, Ci, Ti, di, i)) 1<i<k), we update T to contain the sequence r; .. . ry 
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expr: v < 42 
ctx: [1] 
tre: ([0 : int], 7) 


expr: v < 42 
cta: [2] 


tre: ([0 : int], 7) 


Fig. 1: An example of an ABET. 


true.true 


(true.true) 


and all its prefixes. Further, with each newly added vertex we also add its sibling. 
We say that the trace visits a vertex v if v is a prefix of r1r2 ... Tj. 

As we mentioned in the preliminaries, we assume that the next evaluated 
ABE of each program is fully determined by the values of ABEs evaluated before 
it. This means that each inner vertex v € T' determines the corresponding ABE 
and its calling context. The ABE and the calling context corresponding to v are 
denoted by expr(v) and ctx(v). We extend the notation ezpr(v) also to leaves. 
We set ezpr(v) — end if we have seen a trace with the sequence v of ABE values. 
If this is not the case and v is in T' only because of its sibling (or as the only node 
in the initial tree {e}), we set expr(v) = L. Note that a leaf v with expr(v) = L 
can become a leaf with expr(v) = end or even an inner node if we later obtain 
a trace that visits v. Similarly, a leaf v with expr(v) = end can become an inner 
node. This happens for example when v originally represents a trace that ends 
with an error (e.g., division by zero) and later we found a longer trace visiting 
v that avoids the error. 

Finally, to each inner vertex v € T' we associate some trace that visits it. The 
trace is denoted as trc(v). 

An example of an ABET can be found in Figure 1. It represents the tree 
for the instrumented program from Listing 1.2b after obtaining the first trace 
((0 : int), 7) given in Example 1. 


Definition 2 (Covered vertex). An inner vertex v € T is said to be covered 
if there are inner vertices v,,v; € T satisfying expr(v) = expr(vi) = expr(vy), 
ciz(v) = ctz(v) = ciz(vp), expr(v;.true) A L, and expr(vy.false) # L. An 
inner vertex that is not covered is called uncovered. 


Definition 3 (Open/closed vertex). An inner verter v € T is said to be 
open if expr(v.true) = L or expr(v.false) = L. An inner vertex that is not 
open is called closed. 


3.4 Fuzzing Algorithm 


A high-level description of the fuzzing algorithm is given in Algorithm 1. The 
algorithm starts with instrumentation of the given program P (line 1) and ini- 
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Algorithm 1 Fuzzing algorithm 


create instrumented program P’ from P (see Section 3.2) 

: T — {e} 

(usedInput, n) + execute(P', £) 

: processTrace(usedInput, 7) 

while some inner vertex of T is not covered do 
select an unprocessed open vertex v from T (see Section 5) 
if no v is selected then end test generation 


OO I OY O E 


try to close v in T' using an input generation analysis (see Section 4) 


tialization of the ABET T' (line 2). Then it executes the instrumented program 
on the stream of zero bits (line 3) to obtain an initial trace (usedInput, 7) where 
m is of the form ((ei, Ci, Ti, di, Ni) )1<i<k- 

On line 4, the trace is processed by processTrace(usedInput, m). This func- 
tion updates T with the sequence rir2...r;y as described in Section 3.3. For 
each inner vertex v € T' visited by the current trace and not visited by any trace 
before, we set trc(v) to the current trace. Further, for each vertex v € T visited 
by the current trace and another trace before, we compute the value S d? 
and if it is smaller than the corresponding value for trc(v), we set trc(v) to the 
current trace. Our practical experiments showed that keeping the trace with 
the smaller sum of squares of d; leads to better results than minimizing only 
the current distance |dj,,.,.]. Finally, the function processTrace(usedInput, 7) 
saves usedInput to the output test suite if it is in trc(v) of some vertex v at this 
moment. Otherwise, the trace is completely discarded. 

The main fuzzing loop (line 5) iterates until all vertices in T are covered. In 
each iteration, we select an unprocessed open vertex v € T' (line 6). A vertex is 
processed if it has been analyzed by all input generation analyses. If we fail to 
select v, the fuzzing algorithm terminates (line 7). Otherwise, we try to close v 
by some input generation analysis (line 8). The selection process and the input 
generation analyses are described in Sections 5 and 4, respectively. 


4 Input Generation 


We propose three methods to generate new inputs with the aim to close the 
selected vertex: sensitivity analysis, byteshare analysis, and gradient descent. 
When a vertex is selected, we execute the first of these analyses that has not 
been executed yet for the vertex. The order is important, as byteshare and gra- 
dient descent analyses need the information about sensitive bytes, and byteshare 
analysis is significantly cheaper than the gradient descent analysis. 

In all the analyses, v is the vertex we want to close, we assume without loss 
of generality that expr(v.true) = L, and we define | = |v| + 1, i.e., the depth 
of v. The goal of all analyses is to generate an input for which the resulting trace 
visits v and continues to v.true. In all the analyses, trc(v) = (usedInput, 7) 
denotes the current trace assigned to the vertex v, with the typed values read 
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by the trace usedInput = (in; : ti)1<i<n and the sequence of ABE evaluations 
T = ((ei, ci, ri, di,ni))i«iex. Moreover, whenever any of the analyses executes 
P', the resulting execution trace is processed by processTrace function. 


4.1 Sensitivity Analysis 


The goal of the analysis is twofold. First, it detects so-called sensitive bytes of 
vertex v, denoted as sbytes(v). Let us denote as b; j the j-th byte in the i-th 
typed value in;. We check whether b; j is sensitive by mutating each bit of the 
byte b; ; separately and executing the program P’ on each one-bit mutation. If 
the resulting trace with m’ = ((ej, c}, r;, d;,n;))i«i«x still visits v and the value 
of the distance function in the node v changes, i.e., d; # dı, the whole byte is 
considered sensitive and is added to sbytes(v). We also try changing the whole 
value in; to several selected special values, e.g., the smallest and the greatest 
value of the type t; and special floating-point values, if t; is float. 

Second, during the computation of sensitive bytes, we also extend the tree 
with each executed trace. The sensitivity analysis therefore also effectively works 
as a local neighborhood search around the previous input of the vertex v. 

Observe that when computing sensitive bytes of the vertex v, we can simul- 
taneously use the resulting traces to determine the sensitive bytes of all prede- 
cessors of v. We use this observation as an optimization in the implementation 
to reduce the number of sensitivity analysis executions. 


4.2 Byteshare Analysis 


Let u be an inner vertex of the current tree T' with the same ABE as v (the 
contexts may differ), with a non-empty set of the sensitive bytes, and whose suc- 
cessor u.true is not a leaf. For each such vertex u, the analysis combines inputs 
from trc(u.true) and trc(v) into a new input. More precisely, the new input is 
the same as frc(v), but for each j € {1,2,..., min(|sbytes(v)|, |sbytes(u)|)}, we 
replace the value of the j-th sensitive byte of v by the value of the j-th sensitive 
byte of u in trc(u.true). The idea behind this construction is that we keep the 
new input similar to the original input of trc(v) so that the execution trace will 
likely visit v, but we replace the sensitive bytes of v by those of u.true, which 
might steer the execution to the desired child. 

Note that sbytes(v) and sbytes(u) may be completely different bytes. The 
size of the sets may also differ. Since we lack information for building a mapping 
between sbytes(v) and sbytes(u), we simply map the bytes based on their order. 


4.3 Gradient Descent with Multi-sampling and Locking 


We extend the notion of sensitivity to the typed inputs. An element of the 
sequence usedInputs is called sensitive in v if it contains at least one byte sensitive 
in v. The gradient descent analysis tries to minimize the absolute value of the 
distance for v by changing only the sensitive typed inputs of the vertex v. We 
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Algorithm 2 Gradient descent for vertex v from seed x and distance f(x) 


1: while v is open and the number of steps is below the predefined bound do 
2: for all i € (1,..., m) do 


|ComputeDistance(z,...,2;. 1,2; - 24,24 4.1.52 m)| - |f æ) 


3: compute V;f(az) as As; : 
4: lock each V; f(a) which is not finite 

5: while ||V f(æ)||? is finite and non-zero do 

6 —— A € |f() / IV fI 

T: if A is zero or not finite then return 

8: vic Oo 

9: for all e € {0, —1, 1, —2,2, —3,3] do 

10: av’ + gc — 10*AV f (a) 

11: V’ & V'U{(a’, ComputeDistance(ax’))} 

12: let (x’, f(x’)) € V' be the pair with the smallest finite | f(z’)| 
13: if |f(x’)| < |f(x)| then 

14: x + x', f(x) « f(x) 

15: break 

16: else 

17: lock all extreme coordinates V; f(x) 

18: if no coordiate was locked then return 


fix the values of the inputs that were not identified as sensitive as they likely do 
not influence the value of the distance. In particular, we minimize the function 
f(a) that receives an input vector of m values that correspond to sensitive 
inputs of the vertex v. The value of the function f(a) is computed by a function 
ComputeDistance(ax) that: 


1. Creates the input sequence input’ by replacing the sensitive inputs of the 
original input from trc(v) by the values specified in a. 
2. Executes the program on input’ and obtains the trace (usedInput', 1^), where 
T = (ei; Ci» Tj, di, ni) 1<i<h’- 
. If the trace 7’ does not visit v, returns oo. 
4. Otherwise returns the obtained distance value at the vertex v, i.e., d}. 


[29 


The search for the desired values of x is motivated by the following idea. If 
x is chosen from a small neighborhood around the global minimum of |/ (a)|, 
the value f(x) has roughly the same chance of being positive as negative. L.e., 
there is roughly the same chance of expr(v) being evaluated to true as false. 
'Therefore, we repeatedly run the gradient descent from randomly chosen seeds 
x to approach towards the minimum. Along the way, we perform sampling in 
the descent direction. This sampling also helps escaping from local minima by 
trying more values of the function f(x). 

Our gradient descent starting from one random seed æ is formally described 
in Algorithm 2. We repeatedly perform gradient descent steps from the initial 
seed a until we generate an input that closes the open vertex v or reach the 
predefined bound on gradient descent steps. In the loop at line 2, we numerically 
compute coordinates V;f(x), one for each variable x;, of the gradient vector 
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V f(a). The coordinates are computed using forward differences, where Az; > 0 
is the smallest change of that variable. Since the algorithm works only with finite 
values, all non-finite coordinates V; f(a) are locked, i.e., they are set to zero and 
we do not move in these coordinates in the gradient step. 

The loop at line 5 performs a single gradient step. It first computes the value 
of learning rate A at line 6, which has the property that the linear approximation 
of the function f at æ is zero at the input x — AV f(x). Next we compute a set 
V’ of samples a’ (see the loop at line 9), each representing a candidate for 
the gradient step. Observe that the samples are separated by multipliers 10* 
ranging over several orders of magnitude. These are the samples we mentioned 
earlier, which can both explore the small neighborhood of the global minimum 
and escape from local minima. Only the sample a’ with the smallest |f (x')| is 
considered in the gradient step (see line 12). If none of the samples decreases 
the value of the function, we are stuck in a local minimum and try to escape 
it by locking more coordinates of the gradient. Namely, we identify and lock 
coordinates with high absolute values compared to others as they dominate 
the descent direction. By their locking, we can dramatically change the descent 
direction and potentially move towards the global minimum. If all coordinates are 
locked, i.e., set to zero, ||V f (æ)||? = 35,(Vif(z))? will be zero and the gradient 
descent terminates. 

The gradient descent algorithm is repeatedly called with randomly chosen 
seed inputs æ and the starting distance f(a) = ComputeDistance(x), until the 
target vertex is closed! or we exceed the predefined bound on the number of 
seeds to try. We skip all the seeds x for which ComputeDistance(x) is infinite. 
More details of the algorithm can be found in the technical report [15]. 


5 Target Vertex Selection 


We now briefly describe how we select vertices that are targeted by the analyses 
from the previous section. First, the heuristic tries to select a suitable uncovered 
vertex that has not been processed yet. Second, if all uncovered vertices have 
been processed, it means that none of the analyses was able to cover them. In this 
case, we try to select an open unprocessed vertex and try to close it. The detailed 
description of the selection process is available in the technical report [15]. 


5.1 Selecting an Uncovered Vertex 


Primarily, we want to target uncovered vertices. Before that, we want to explore 
program executions with diverse numbers of loop iterations. To this end, we 
would like to identify all loop head vertices in the ABET, which can be expensive. 
Therefore, we perform loop head detection lazily on the fly. We maintain a 
worklist of loop heads H and if it is not empty, we remove its random vertex 


! In fact, Algorithm 2 is immediately terminated when the target vertex is closed by 
any execution of ComputeDistance. 
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and select it as the target. Only if the worklist H is empty, we select a suitable 
vertex v in the tree based on vertex selection heuristics and detect loop heads 
on the path to the vertex v. If there are loop heads on the path to v, we put 
some of them to H based on the loop head selection heuristics and randomly 
take one of them as the target vertex. If there are no loop heads on the path to 
v or the loop heads on the path to v have been processed, we select v itself as 
the target vertex. We now describe the heuristics that we use for selection the 
suitable vertex v and for selection of loop heads on the path to v. 


Vertex Selection Heuristics The selection relies on the classification of the 
uncovered vertices into three categories: input-sensitive vertices with sbytes(v) Z 
(), input-insensitive vertices with sbytes(v) = 0, and vertices with unknown sensi- 
tivity, on which the sensitivity analysis has not been performed yet. Additionally, 
we call a vertex likely input-insensitive (LII), if it has unknown sensitivity and 
there is an input-insensitive vertex with the same ABE and calling context in the 
current ABET. 

'The input-insensitive vertices often arise in practice. For example, when pro- 
cessing the loop for (int i = 0; i « 1000; ++i), all the ABET vertices with 
the ABE i « 1000 will be input-insensitive as the number of iterations does not 
depend on the input. Moreover, both byteshare and gradient descent analyses are 
useless on input-insensitive vertices, so we prefer not processing the LII vertices 
to avoid useless sensitivity computations. However LII vertices cannot be ignored 
completely as they can be in fact input-sensitive. For this reason, we first try se- 
lecting uncovered vertices that are either input-sensitive, or that have unknown 
sensitivity but are not LII. We sort such vertices lexicographically according to 
the following criteria and select the best vertex v. 


1. Input-sensitive vertices are preferred to vertices with unknown sensitivity as 
we want to exploit the computed information about sensitive bytes. 

2. Vertices with fewer sensitive bytes are preferred, as the analyses are more 
expensive with more sensitive bytes. 

3. Vertices with the number of input bytes closer to the half of the maximal 
number of input bytes of all ABET vertices are preferred, as it helps to explore 
loop iterations that are deep enough to be interesting and at the same time 
to keep the number of input bytes reasonably small. 

4. Vertices closer to the root of the execution tree are preferred, as they are 
easier to process. 


If no such vertex exists, we fall back to choosing an LII vertex. We use the 
distance function to select a promising LII vertex in the following way. We select 
the uncovered vertex v if it is LII and all identified input-insensitive vertices with 
the same ABE and context have greater absolute value of the distance function. If 
there are more such vertices v, we first prefer the ones with the smallest absolute 
value of the distance function and then according to the criteria similar to the 
previous ones. 
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Loop Head Selection Heuristics To fill the worklist H, we detect all loop 
heads on the path to v. The identified loop heads are grouped to buckets of ex- 
ponentially increasing size according to the number of bytes read from the input. 
'This ensures that we do not process too many loop heads to make the search im- 
practical, but we still explore loop heads with diverse depths of loop iterations. 
We then pick from each bucket the vertex that lexicographically minimizes the 
number of input bytes and the depth and add it to the worklist H. 


5.2 Selecting an Open Vertex 


If the previous algorithm failed to select a vertex, it means that all uncovered 
vertices are processed. We try to make progress by selecting a vertex that is 
covered but still open. The rationale is that by exploring the open vertex, albeit 
otherwise covered, we hope to extend the ABET with new vertices where the 
analyses can continue further and some open vertices might become covered. 
In particular, we choose an open input-independent vertex with a small value 
of the distance function and identify an earlier loop head on the path to the root 
as in the previous subsection. We then perform a random ABET traversal from 
the loop head and select the first open unprocessed vertex for which the search 
tries to visit to its unvisited child. If this search fails as well, then the analysis 
cannot make any further progress, returns null and the fuzzing loop terminates. 


6 Implementation 


We implemented the approach in an experimental tool called FIZZER. The tool 
is implemented in C++, consists of around 11,000 lines of code (in 125 files), 
and the only external tool it depends on is the CLANG compiler and its libraries. 
'The tool is open-source and available under ZLIB license either as an artifact at 
Zenodo [13] or at the repository [14]. 

Given a C program to be analyzed, FIZZER first compiles it into LIVM bit- 
code using the CLANG compiler. The bitcode is then instrumented using our 
instrumenter, which first applies à standard LLVM pass to replace all switch 
instructions by sequences of if-else statements? and then finds and instru- 
ments all ABEs and function calls. Observe that we ignore br instructions, i.e., 
we do not care about the actual control flow. After the instrumentation, we link 
the instrumented LLVM bitcode with our implementations of nondet, type O and 
__instr_*() functions into the final executable program, called target, which 
will be repeatedly executed by the main FIZZER process. 

Whenever FIZZER wants to execute the target with some input, it spawns 
a new process with the target executable. During the execution of target, 
the instrumented code tracks the current call context, collects data about the 
executed ABEs, and stores them to the shared memory, which is accessible by the 
parent FIZZER process. The separation of FIZZER and target to independent 
processes allows handling crashes of the target. 


? We should also replace calls via function pointers by sequences of if-else state- 
ments. This pass is not implemented yet. 
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7 Evaluation 


Experimental setup. For evaluation of the implemented tool FIZZER, we use 
all branch-coverage benchmarks from Test-Comp 2023, the 5th Competition on 
Software Testing [5]. The benchmark set consists of 2933 benchmarks divided 
into 16 families. For the presentation purposes, “ReachSafety” and “SoftwareSys- 
tems” substrings in the family names are shortened to “rs” and “ss”, respectively, 
in the rest of this section. For comparison, we used three best-scoring tools? from 
Test-Comp 2023, namely FUSEBMC [2], VERIFUZzZ [21], and COVERITEST [6], 
in the versions in which they entered Test-Comp 2023. To obtain reproducible 
results, we asked the organizer of Test-Comp to evaluate FIZZER on the official 
infrastructure of Test-Comp and compare the obtained results with the official 
results of Test-Comp 2023. We stress out that this means that the results were 
produced by an independent third party and thus are independently reproducible. 
'The resource limits of the competition are 15 minutes of CPU time and 15 GB 
of RAM. A detailed description of the infrastructure and the setting used for the 
experimental evaluation we refer to the competition report [5]. 


Results. The average branch coverage for each tool and each benchmark family 
is shown in Table 1. The table shows that the approach proposed in this paper 
and implemented in the tool FIZZER is competitive with FUSEBMC - the winner 
of Test-Comp 2023 — in most of the benchmark families except rs- Combinations, 
rs-ECA, and rs-Sequentialized. It is also competitive with the other state-of- 
the-art tools on all of the benchmark families. Although the table shows that 
FIZZER is the best on average in benchmark families rs-ControlFlow and ss- 
SQLite-MemSafety, we do not consider these particular results significant due 
to the small size of these families. 

Figure 2 provides a comparison of the branch coverage achieved by FIZZER 
and the other considered tools on individual benchmarks. It can be seen that 
while on most of the benchmarks, FIZZER provides the same or worse coverage 
than FUSEBMC, there are some benchmarks where it provides better coverage. 
It is also comparable with VERIFUZZ and provides better branch coverage than 
COVERITEST on a large number of benchmarks. 

Out of all 2933 evaluated programs, there are 145 programs where FIZZER 
provides better coverage than any other of the compared tools. For comparison, 
COVERITEST provides the best coverage for 129 programs, FUSEBMC for 318, 
and VERIFUZZ for 180. The distribution of these benchmarks to the individual 
benchmark families can be found in Table 2. 

Finally, note that FIZZER participated in Test-Comp 2024 and placed third 
in the category Cover-Branches after FUSEBMC and FUSEBMC-AI.* 


3 We do not compare against FUSEBMC IA [1], the runner-up in Test-Comp 2023, 
as we want to compare only with the best variant of each individual tool, not all 
their variants. 

^ https://test-comp.sosy-lab.org/2024/results/results-verified/ 
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Table 1: Average branch-coverage of the tests generated by the individual tools 


for individual benchmark families and for all benchmarks. The results are in 

percents. The best result of each benchmark family is printed typeset in bold. 
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rs-Arrays 292 71.2 84.6 86.5 81.6 
rs-Bit Vectors 61 78.8 77.3 79.5 73.8 
rs-Combinations 671 34.8 42.0 50.7 37.6 
rs-ControlFlow 11 4.0 14.1 13.7 13.5 
rs-ECA 29 18.3 25.1 32.3 34.9 
rs-Floats 197 46.9 48.2 50.8 49.8 
rs-Heap 110 68.9 72.5 72.7 70.6 
rs-Loops 661 79.6 80.3 82.1 81.4 
rs-ProductLines 263 29.0 28.8 29.2 29.2 
rs-Recursive 51 78.4 84.2 85.8 76.0 
rs-Sequentialized 91 80.4 66.5 87.8 88.4 
rs-XCSP 114 99.8 88.5 91.7 92.6 
ss-BusyBox-MemSafety 62 16.9 32.9 33.2 0.0 
ss-DeviceDriversLinux64-rs 287 20.6 20.5 20.6 19.7 
ss-SQLite-MemSafety 1 0.0 3.7 3.4 3.5 
Termination-MainHeap 32 95.6 95.3 95.1 90.9 
All 2933 54.3 57.3 61.0 56.2 


8 Related Work 


The sensitivity analysis is a form of taint analysis, which is a technique popular 
in fuzzing [17,8,22,9,4,10,12,23,25,7,19,26]. The most frequent approach to taint 
analysis is propagating the taint information explicitly from taint sources (e.g., 
sources of input) through the program instructions [17,8,22,9,4,10,12,23,25]. Most 
of the approaches propagate taint information dynamically. However, some of 
them compute it statically [23], with use of control flow information [25], or 
using concrete and symbolic execution [7]. There are two papers [19,26] that 
compute the tainted bytes by identifying input bytes that lead to different pro- 
gram executions. This is most similar to our approach. But our approach also 
tries extreme values of typed inputs and performs more precise one-bit muta- 
tions, which are then extended to byte boundaries, while the mentioned papers 
[19,26] only mutate whole bytes. 

Gradient descent is used in fuzzing [8,26,9,16,24] in different forms. For in- 
stance, there is a paper [8] that uses forward and backward method of finite 
differences for computation of the partial derivatives. Additional constraints ap- 
pearing in the control flow have been also considered [9]. Another approach [16] 
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Coverage Fizzer 
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Coverage FuSeBMC Coverage VeriFuzz Coverage CoveriTest 
Fig. 2: Scatter plots comparing branch coverage achieved by FIZZER and the 
other considered tools. 


exponentially decreases the learning rate as the gradient descent progresses. 
Our approach differs especially in taking multiple samples along the gradient 
direction in each descent step. The samples span several orders of magnitude 
along the line, which can both provide samples in the small region close to 
the global minimum and help escaping from local minima. We further compute 
the learning rate from the linear approximation of the function. Thanks to 
multi-sampling, this simplification is sufficient in practice. Lastly, our approach 
is extended with locking coordinates, which can contribute to escaping from local 
minima by avoiding extreme directions. 

Our approach further uses a unique coverage goal. Other fuzzers monitor 
actual control flow of the program execution (to measure, e.g., branch coverage), 
while we ignore it completely. We instead monitor values of all ABEs and aim 
for their coverage. The byteshare analysis is also a novel approach inspired by 
genetic algorithms. The random search we apply to select the target vertex is 
novel among fuzzers, but it was used in the context of concolic execution [20]. 

The experimental evaluation of the paper compares the proposed approach 
with the best test-generation tools participating in Test-Comp 2023. All of these 
combine several analyses. Namely, FUSEBMC [2,3] combines bounded model- 
checking (BMC), symbolic execution, and two fuzzers (AFL [27] and a selective 
fuzzer) and FUSEBMC IA [1] extends it further with interval analysis. VER- 
IFuzz [21] is built on top of AFL and an engine based on Coverage Guided 
Fuzzing, combined with the bounded model checker CBMC and the PRISM frame- 
work. COVERITEST [6] combines several model checkers. 


9 Conclusion 


We presented a novel approach to gray-box fuzzing, which aims to generate 
tests that cover both possible values of each atomic Boolean expression. To 
reach this goal, our approach uses a dynamic computation to identify the bytes 
that influence the value of a given Boolean expression. Further, it employs two 
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Table 2: The numbers of benchmarks in individual benchmarks families where a 
given tool achieved better branch coverage than the other considered tools. 


= 5 fc N 
& z le E E 
Oo E 
O 
ReachSafety-Arrays 0 12 18 4 
ReachSafety-Bit Vectors 0 3 1 2 
ReachSafety-Combinations 96 23 243 139 
ReachSafety-ControlFlow 2 1 1 1 
ReachSafety-ECA 1 1 6 15 
ReachSafety-Floats 0 16 1 0 
ReachSafety-Heap 1 8 0 2 
ReachSafety-Loops 0 6 6 0 
ReachSafety-ProductLines 0 33 0 3 
ReachSafety-Recursive 0 1 4 0 
ReachSafety-Sequentialized 0 2 16 14 
ReachSafety-XCSP 14 0 0 0 
SoftwareSystems-Busy Box-MemSafety 4 27 19 0 
SoftwareSystems-DeviceDriversLinux64-ReachSafety 9 11 3 0 
SoftwareSystems-SQLite-MemSafety 0 1 0 0 
Termination-MainHeap 2 0 0 0 
All 129 145 318 180 


analyses to find the value of these bytes to get the desired value of the Boolean 
expression. One of these analyses is based on gradient descent. 


We implemented the proposed approach in an experimental tool called FIZZER. 
An independent evaluation shows that, despite being a pure gray-box fuzzer, it 
is competitive with the state-of-the-art tools competing in Test-Comp 2023. 

In future, we plan to add the support for calls via function pointers and 
gradient descent tailored for floating-point values. We will also investigate an 
extensible architecture that allows running different external analyses on the 
vertices of the execution tree. In particular, this would allow running techniques 
such as symbolic execution on vertices that cannot be covered by gradient descent 
alone, which could improve the performance of our tool even further. 
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Abstract. The computation of bottom strongly connected components 
(BSCCs) is a fundamental task in model checking, as well as in character- 
izing the attractors of dynamical systems. As such, symbolic algorithms 
for BSCCs have received special attention, and are based on the idea 
that the computation of an SCC can be stopped early, as soon as it is 
deemed to be non-bottom. 


In this paper we introduce PENDANT, a new symbolic algorithm for com- 
puting BSCCs which runs in linear symbolic time. In contrast to the stan- 
dard approach of escaping non-bottom SCCs, PENDANT aims to start the 
computation from nodes that are likely to belong to BSCCs, and thus is 
more effective in sidestepping SCCs that are non-bottom. Moreover, we 
employ a simple yet powerful deadlock-detection technique, that quickly 
identifies singleton BSCCs before the main algorithm is run. Our exper- 
imental evaluation on three diverse datasets of 553 models demonstrates 
the efficacy of our two methods: PENDANT is decisively faster than the 
standard existing algorithm for BSCC computation, while deadlock de- 
tection improves the performance of each algorithm significantly. 


Keywords: BDDs - strongly connected components - symbolic algorithms 


1 Introduction 


The decomposition of a graph to its strongly connected components (SCCs) is one 
of the most standard tasks in automated system verification. For example, model 
checking against LTL and w-regular properties reduces to computing cycles ; 
while fairness conditions are typically checked given an SCC decomposition of 
the graph [21]34]. Of special interest are bottom/terminal SCCs (or BSCCs), i.e., 
SCCs that, once entered, cannot be escaped. BSCCs are used to speed up LTL 
model checking [28], and they capture the long-run properties of Markov Chains 
and Markov Decision Processes [23113], while they also correspond to the 
attractors of dynamical systems, as in signal transduction networks [29]33]. 


Large-scale model-checking settings comprise huge systems that suffer from the 
state-space explosion problem. These systems are usually represented compactly 
by a model, e.g., by means of a programming language, a logic or a reaction net- 
work, and have size that is exponentially large in its description. Nevertheless, 
the system typically exhibits numerous symmetries that can be preserved when 
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the state space is represented symbolically rather than explicitly. One predomi- 
nant symbolic representation is via (reduced/ordered) Binary Decision Diagrams 
(BDDs) [10], which are found at the core of many classic and modern model 
checkers [14242012615]. To benefit from the symbolic representation, analysis 
algorithms typically only have coarse-grained access to the graph, querying for 
the successors (Post(X)) and predecessors (Pre(X)) of a set of nodes X rep- 
resented by a single BDD. Each such operation counts as a symbolic step. As 
symbolic steps are significantly slower than primitive operations, they serve as 
the complexity measure of symbolic algorithms [9118[12[25]. 


Due to the prevalence of SCC decomposition, the problem has been studied ex- 
tensively in the symbolic setting, starting with the XIE-BEEREL algorithm 
of symbolic complexity O(n?); LOCKSTEP [8] improves this bound to O(nlog n), 
while SKELETON [17] achieves O(n) time at the expense of O(n) symbolic space 
(i.e.; number of BDDs). The most recent step in this progression is CHAIN [25] 
which achieves both O(n) symbolic time and O(log n) symbolic space. In prac- 
tice, heuristics aim to further improve the running time [91]16]34]. 


Naturally, the computation of BSCCs can be achieved by using one of the afore- 
mentioned algorithms to obtain an SCC decomposition, and check whether each 
SCC is indeed a BSCC. In practice, however, computing an SCC can be ex- 
pensive, as it typically requires traversing it multiple times. For this reason, 
algorithms dedicated to BSCCs have received special attention. Although these 
do not offer theoretical improvements, they attempt to minimize the number of 
non-bottom SCCs computed and thus perform better in practice. 


The predominant, general-purpose BSCC-decomposition algorithm is BWDF wp, 
which is a modification of XIE-BEEREL [32], and has O(n) complexity. Effec- 
tively, this algorithm aborts the computation of an SCC S as soon as it deter- 
mines that S cannot be a BSCC, and removes it from the graph, as well as any 
node that can reach S. A recently-introduced preprocessing technique, called 
interleaved transition-guided reduction (ITGR) [6], aims to further detect and 
discard non-bottom SCCs before the main algorithm is run. ITGR is general- 
purpose, and was shown to be effective in handling asynchronous Boolean Net- 
work models [3I1]2]. However, as these algorithms are typically executed on huge 
inputs, issues of scalability often remain. We address this challenge here. 


1.1 Our contributions 


The PENDANT algorithm. We develop a new, linear-time algorithm for sym- 
bolic BSCC computation, called PENDANT, drawing inspiration from the recent 
CHAIN algorithm [25]. In contrast to the existing BSCC paradigm based on stop- 
ping the computation of SCCs that are deemed non-bottom, PENDANT aims to 
start such computations from SCCs that are likely to be bottom. To achieve this, 
while PENDANT computes an SCC, it also implicitly (at no extra cost) traverses 
the quotient graph Q downwards, making future SCC computations start from 
nodes that are close to the bottom of Q, and thus discover a BSCC quickly. 
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Deadlock detection. We employ a simple yet powerful preprocessing tech- 
nique, called deadlock-detection. This is based on the insights that (i) each dead- 
lock (singleton SCC) is a BSCC, and (ii) all deadlocks can be computed effec- 
tively in a single symbolic step. 


Experimental evaluation. We implement PENDANT and the deadlock- 
detection preprocessing, and evaluate their performance on computing the 
BSCCs of a large pool of models from three diverse datasets, namely, (i) Petri 
Nets from the Model Checking Contest [22], (ii) DiVinE models from the Bench- 
mark of Explicit Models [27], and (iii Asynchronous Boolean Network mod- 
els [312]. Our experiments conclude that (i) PENDANT is decisively more effi- 
cient than BwDF wp, (ii) deadlock-detection improves the performance of both 
algorithms, and (iii) after deadlock-detection, ITGR is scarcely effective. 


2 Preliminaries 


In this section we present standard definitions and the BWDFWD algorithm. 


2.1 Graphs, Bottom SCCs and Symbolic Representations 


Graphs. We consider directed graphs G = (V, E), where V is a set of nodes and 
EC V xV isa set of edges. We often write u — v to denote an edge (u,v) € E. 
For a node v, the image of v is Post(v) = (u | v — u}, while the pre-image of v 
is Pre(v) = {u | u — v). These notions are extended to sets of nodes X in the 
natural way, i.e., Post(X) = LJ, c x Post(v) and Pre(X) = U,ex Pre(v). 


A path is a sequence P = vı — v2 > ::- — vk, in which case we also write 
U ~ Up, and say that v, is reachable from vı. The length of P is |P| = k — 1. 
For a set of nodes X we write Fwd(X) = {u | dv € X,v ~ u} for the forward 
set of X and Bwd(X) = {u | dv € X,u ~ v} for the backward set of X. We 
call a set X C V forward-closed if Fwd(X) C X. The restriction of G on a set 
X C V is the graph G[X] = (X, (X x X) E). A node v € V is called a deadlock 
if it has no outgoing edges, i.e., Post(v) = @. 


Bottom Strongly Connected Components (BSCOs). A strongly connected 
component (SCC) of G is a maximal set of nodes S such that for all u,v € S we 
have u ~ v. Each node v belongs to one SCC, written SCC(v). A set X C V is 
called SCC-closed if for each v € X, we have SCC(v) C X. The diameter of an 
SCC S is the maximum distance between two nodes in S, i.e., 
ó(S) = max min |P| 
u,v€S P: uv 

The quotient graph of G represents each SCC of G by a single node, and has 
a directed edge S — S" iff Post(S) n S' Æ (), i.e., there exists nodes u € S and 
v € S' with u — v. The quotient graph is a directed acyclic graph. The leaf 
nodes of a quotient graph represent the SCCs that have no outgoing edges to 
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any other SCCs, called bottom SCCS (or BSCCs). We denote by SCCs(G) and 
BSCCs(G) the set of SCCs and BSCCs of G, respectively. 


The problem targeted in this paper is the computation of BSSCs. The following 
two simple properties of BSCCs are used throughout the paper. 


Proposition 1. An SCC S is a BSCC if and only if Fwd(S) = S. 


Proposition 2. If S is a BSCC then there is no BSCC in Bwd(S) N S. 


Symbolic operations and complexity. In large-scale model-checking settings, 
graphs are typically represented symbolically. One popular symbolic representa- 
tion is Binary Decision Diagrams (BDDs) [19]. In particular, the node set V and 
edge relation E are represented compactly as BDDs, while algorithms use BDDs 
as data structures for representing subsets of V and E. The basic BDD oper- 
ations give only coarse-grained access to the graph: given a BDD representing 
a set of nodes X, an algorithm can access Pre(X) and Post(.X), each of which 
counts as one symbolic step. The complexity of symbolic algorithms is measured 
in the number of symbolic steps they execute [12[25], since these are much slower 
than elementary operations (e.g., incrementing a counter). Basic set operations 
on BDDs (union, intersection, etc.) also do not count towards the time complex- 
ity] Finally, given a set X represented as a BDD, we use a PICK(X) operation 
which returns an arbitrary node v € X. This operation is natural and efficient 
for BDDs, and has been common in symbolic SCC algorithms [17]8]25]. 


2.2 The BwDFwp Algorithm for BSCCs 


The symbolic computation of BSCCs(G) can be performed by computing each 
S € SCCs(G) using some existing symbolic algorithm [8211718125], and then re- 
porting that S is a BSCC iff Post(S) C S (following Proposition[]. Although this 
approach runs in O(n) symbolic steps when using CHAIN [25] or SKELETON [17], 
it can be unnecessarily slow in practice, as it typically spends considerable time 
computing SCCs that are not BSCCs. For this reason, the computation of BSCCs 
is targeted by algorithms dedicated to this task. The standard symbolic BSCC 
algorithm is BWDF wD, which we briefly present here. 


The Backward-Forward BSCC algorithm. BwDbFwnp is an adaptation 
of the standard Xie-Beerel algorithm [32]. Algorithm D] follows its recent pre- 
sentation in [6], adapted to our setting. The algorithm uses the standard 
mechanism for computing SCCs symbolically: given a pivot node v, we have 
SCC(v) = Fwd(v) n Bwd(v). Given such a node v, BwDF wp first calls Algo- 
rithm |1| (Line |3) to retrieve the backward set Bwd(v) (called the basin of v) 
using a standard fixpoint computation. Then, it uses a similar fixpoint compu- 
tation to retrieve Fwd(v) (Line|5) in F. This computation is terminated early 


*For many algorithms, including ours, counting set operations does not affect the 
asymptotic complexity. 
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Algorithm 1: Bwp 
Input: A graph G = (V, E) and a node v € V 


1ı B= {v} 
2 while Pre (B) Z B do // Fixpoint not reached 
3 B= BUPre(B) // Update with new predecessors 


4 return B 


Algorithm 2: BwDF wp 
Input: A graph G = (V, E) 


1 if V =@ then return 

2 v = PICK(V) // Pick a pivot 
3 B — BWD (G, v) // Compute safe-to-remove nodes 
4 F — (;Layer = {v} 

5 while Layer Z (and F C B do // Compute and detect BSCC 
6 F = FU Layer 

7 Layer = Post (Layer) V F 

s if FC B then // Output if BSCC 
9 output Fwd 

10 BwpF wp (G[V \ B]) // Recursive call w/o safe nodes 


if the algorithm discovers that Fwd(v) Z Bwd(v), as then Fwd(v) Z SCC(v), 
and due to Proposition |1| we have that SCC(v) is not a BSCC. On the other 
hand, if the computation is carried to a fixpoint, we have that Fwd(v) € Bwd(v) 
and thus Fwd(v) = SCC(v); then, Proposition [1] guarantees that SCC(v) is a 
BSCC. Since the check in Line 9|succeeds, BWDFwD correctly outputs SCC(v) 
as a BSCC. Finally, crei Men that the basin Bwd(v) contains no 
BSCC, except possibly SCC(v) which was just outputted. The algorithm hence 
safely removes Bwd(v) from G, and proceeds recursively (Line [10]. 


It is not hard to see that BWDFWD runs in O(n) symbolic steps, but offers two 
practical improvements over general SCC-decomposition algorithms. In each re- 
cursive call, the algorithm avoids computing SCCs in Bwd(v) V SCC(v) as they 
are guaranteed to be non-bottom; nodes in this set are only accessed during the 
basin computation in Algorithm [1] which is cheaper. Moreover, it stops comput- 
ing SCC(v) as soon as it discovers that Fwd(v) Z Bwd(v) (as SCC(v) is not a 
BSCC). However, the algorithm can spend significant time in computing Fwd(v) 
before it discovers that Fwd(v) Z Bwd(v), which results in wasteful symbolic 
operations. The following example illustrates this issue on a small graph. 


Example. m shows a graph G — (V, E) (a) and two recursion trees. The 
left-most tree (b) illustrates the execution of BWDF'WD on G. Each node in the 
tree has its variables subscripted by the pivot node v chosen in the corresponding 
recursive call, with the variables showing their values in that recursive call. E.g., 
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F, is the value of F after the loop of Line [5] has completed, given that v was 
chosen as pivot in that recursive call. The number of a node is underlined in 
F, if it is a node is outside the backward set B, and cuts the computation 
of F, short (Line 5}. Observe that the algorithm makes four recursive calls, 
where the second (v = 2) and third (v = 3) call spend considerable time in 
the forward computation (of the sets P» and F3, respectively), and essentially 
compute SCC(2) and SCC(3) before determining that these are not BSCCs. 


5,7,9, 10} 
Bs = {3,5,7,9}, Fs = (3,5, 7,9, 10} 


Wio = {2, 3, 4,5, 6, 7,8, 9, 10} 


Fio = {10} 
Lig = {10} 
Sio = {10} 


© 
Bio = {10}, Fro = {10} 
(a) (b) (c) 


Fig. 1: An example input graph (a) and the recursion trees of the BwdFwd (b) 
and PENDANT algorithms on it. 


Bio — (1,2,3,4,5,6,7,8,9) 


3 The PENDANT Algorithm for BSCCs 


In this section we present our new algorithm, PENDANT, for computing BSCCs 
symbolically. Like BWDF WD, PENDANT spends linear time in the number of 
nodes of the input graph. In particular, we have the following theorem. 


Theorem 1. Given a graph G = (V,E) of n nodes, PENDANT computes 
BSCCs(G) in O(* /sesccs(a) 9(8)) = O(n) symbolic time. 


However, as we will see in Section[] in practice PENDANT typically requires fewer 
symbolic steps than BWDF WD. Intuitively, this is achieved by making, over time, 
smarter choices of pivot nodes v to start the SCC computation, meaning nodes 
v that are more likely to have SCC(v) close to the leaves of the quotient graph. 
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In turn, this reduces the number of non-bottom SCCs computed throughout the 
execution of the algorithm, which reduces the number of symbolic steps. 


3.1 PENDANT 


PENDANT is shown in Algorithm [4] and uses FWDLASTLAYER, shown in Algo- 
rithm [3] as a sub-procedure. 


Algorithm 3: FWDLASTLAYER 
Input: A graph G = (V, E) and a node v € V 
i F = Q; Layer = {v}; L = Q 


2 while Layer 4 () do // Fixpoint not reached 
3 F = FU Layer // Update with new successors 
4 L = Layer // L stores the last layer of nodes reached 
5 Layer = Post (Layer) VF // Compute the new layer 


6 return F,L 


FWDLASTLAYER. FWDLASTLAYER computes the forward set Fwd(v) of a 
node v using a standard fixpoint computation. The algorithm also keeps track 
of the last layer L of nodes discovered during the fixpoint computation, and 
returns both Fwd(v) (represented in F) and L. Intuitively, Fwd(v) is used by 
PENDANT for computing SCC(v) and testing whether it is a BSCC, while L is 
used to guide the selection of future pivots downwards in the quotient graph. 


PENDANT. On input G = (V, E), PENDANT begins by PICK’ing an arbitrary 
pivot node v (Line B), with the aim to compute SCC(v) and test whether it is a 
BSCC. For this purpose, it calls FWDLASTLAYER to retrieve F = Fwd(v), and 
L being the last layer of Fwd(v) (Line 5h. It then computes S = SCC(v), by 
calling BWD (Algorithm [1] Line [6) to compute the backward set of v restricted 
to Fwd(v). At this point, there are two cases. 


— If FX S £0, then S is not a BSCC. At this point, the set W = F \ S is 
guaranteed to contain a BSCC, and the algorithm resumes its search for a 
BSCC in this set, running a new iteration of the main loop. Moreover, the 
algorithm attempts to pick a new pivot in the last layer of Fwd(v) (Line [10], 
as opposed to an arbitrary node in W. Intuitively, this effectively allows 
PENDANT to traverse the quotient graph downwards towards its leaves, and 
thus quickly pick a pivot v such that SCC(v) is a BSCC. 

— If FAS = 0, then D = SCC(v) is guaranteed to be a BSCC; this is reported 
(Line [15], and the loop breaks (Line a Then the backwards set of B is 
computed and removed from the graph, as it is guaranteed to not contain 
any other BSCC, and the algorithm proceeds recursively in the remaining 
graph (Line [17]. Note that the number of recursive calls of PENDANT thus 
equals the number of BSCCs in the input graph. 
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Algorithm 4: PENDANT 
Input: A graph G = (V, E) 
1 if V = 0 then return 


2 v = PICK(V) // Pick a pivot 
s3W=V;D=90 // D stores a BSCC, once found 
4 while D = () do // Find a BSCC 
5 F,L-FwDLASTLAYER (G[W], v) // Get Fwd(v) and its last layer 
6 S = Bwo (G[F], v) // Compute SCC(v) 
7 if F\S Athen // Not a BSCC 
8 W = FAS // W contains a BSCC, continue here 
9 if LAW z( then // If there are candidates in last layer, 
10 v = Pick (LNW) // pick new pivot from the last layer 
11 else 

12 v — Pick (W) // otherwise, pick any v from W 
13 else 

14 D-S 

15 output D 

16 B = Bwbp(D,G) // Compute safe-to-remove nodes 
17 PENDANT (G[V \ B]) // Recursive call w/o safe nodes 


Observe the qualitative differences between PENDANT and BWDFwnp. First, 
BwDFwD begins with a backward search from the pivot v, while PENDANT be- 
gins with a forward search from v. Second, BWDF'wD removes the basin Bwd(v) 
from G as soon as SCC(v) is deemed to be non-bottom, while PENDANT de- 
lays this step, and only computes (and removes) the basin of BSCCs. Third, 
BwDFwp picks pivots completely arbitrarily, whereas PENDANT, any time it 
computes an SCC S that is not bottom, it picks the next pivot from a distant 
successor of S in the quotient graph, which allows it to discover BSCCs quickly. 


Example. Let us revisit our example in Fig.|1| The right-most recursion tree 
illustrates the computation of PENDANT. Since there is only one BSCC, there 
is only one recursive call, but the node is subdivided to show each iteration of 
the loop in Line 4| As before, variables are subscripted with the pivot node v of 
that iteration. Initially, PENDANT chooses arbitrarily v — 1, like BWDF WD, and 
computes Fwd(1). Then, it deems SCC(1) as a non-bottom SCC, and the next 
pivot is chosen from the last layer of Fwd(1), i.e., v — 10. Effectively, PENDANT 
has reached a leaf of the quotient graph (the only leaf, in this case), and thus 
identifies a BSCC quickly. Importantly, it skips the expensive computation of two 
SCCs with large diameters (SCC(2) and SCC(3)), in contrast to BWDF WD. 


3.2 Correctness 


We now turn our attention to the correctness of PENDANT. We start with two 
simple lemmas regarding forward-closed sets. 


118 A. B. Jakobsen, R. S. M. Jørgensen, J. van de Pol, A. Pavlogiannis 


Lemma 1. Assume that X C V is forward-closed, and D C X is a BSCC. 
Then X \ Bwd(D) is forward-closed. 


Proof. For any node v € X, if Fwd(v) n Bwd(D) 4 0 then clearly v € Bwd(D) 
and hence v Z X \ Bwd(D). Thus, for every node v € X V Bwd(D), we have 
Fwd(v)n Bwd(D) = 0, and since X is forward-closed, we have Fwd(v) C X. 


Lemma 2. For any node v, the set Fwd(v) \ SCC(v) is forward-closed. 


Proof. For any node u € Fwd(v), if Fwd(u) n SCC(v) 4 0, then u € SCC(v). 
Hence for every node u € Fwd(v) V SCC(v), we have Fwd(u) n SCC(v) = 0, and 
thus Fwd(u) € Fwd(v) V SCC(v). The desired result follows. 


We now prove the soundness of PENDANT, i.e., every SCC outputted in Line 
is a BSCC. For this, we prove the following stronger lemma, which states three 
invariants maintained by the algorithm. 


Lemma 3. At each iteration of the main loop of PENDANT, the following in- 
variants hold: (a) V and W are forward-closed, (b) S is an SCC, and (c) D is 
a BSCC. 


Proof. Before entering the first iteration of the loop, we have that each of W 
and V is the whole node set of the input graph, hence both are trivially forward- 
closed. Now, assuming that W is forward-closed, we have that F = Fwd(v) 
in Line |5| In turn, this implies that S = SCC(v) in Line |6| Moreover, due to 
Proposition [I] ifFC Sin Line[7] then S is a BSCC, thus D outputted in Line[15] 
is indeed a BSCC. 


To complete the invariant proof, it remains to argue that V’ and W' remain 
forward-closed after they have been updated. There are two cases. 


1. If the algorithm proceeds with another iteration of the main loop, we have 
V’ = V and W' = F \ S. Since F = Fwd(v) and S = SCC(v), Lemma p] 
implies that W’ is forward-closed. 

2. Otherwise, the algorithm proceeds with a new recursive call in Line We 
have that W’ = V’ = V \ B, where B = Bwd(D), and D is a BSCC. By 
Lemma [1] we have that V X B is forward-closed, as desired. 


Observe that case (c) of Lemma B]establishes the soundness of PENDANT. Next 
we establish its completeness, thereby concluding the correctness of PENDANT. 


Lemma 4. PENDANT outputs every BSCC of the input graph. 
Proof. First, observe every time PENDANT calls itself recursively in Line 


it has outputted a BSCC D, and the recursion proceeds on the subgraph V \ 
Bwd(D). Due to Proposition |2| the algorithm has outputted all BSCCs in VN 
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Bwd(D). Hence, in each recursive call on a graph G = (V, E), the node set V 
contains all the BSCCs not already outputted by the algorithm. It thus suffices 
to argue that, in each recursive call, the main loop eventually terminates, as in 
doing so it outputs a BSCC. 


In each iteration of the main loop, the set W is updated to W' = F\ S (Linef), 
where F = Fwd(v) and S = SCC(v), where v is the current pivot. Since F C W 
and S Æ 0), it follows that W' C W, and thus the loop must eventually terminate. 


3.3 Complexity 


Although the linear upper-bound of BWDF wD is trivial, the case of PENDANT is 
more involved. This is because a call to FWDLASTLAYER may compute forward 
sets that consist of many layers (and thus cost many symbolic steps), while these 
sets are not immediately removed from the graph (as opposed to the backward 
set computed by BwDF wp), and are again accessed in future iterations of the 
algorithm. Nevertheless, a careful analysis shows that the complexity is indeed 
linear. We start with a simple lemma. 


Lemma 5. Assume that X C V is forward-closed and D C X is a BSCC. Then 
Bwd(D) n X is SCC-closed. 


Proof. Consider any node v € Bwd(D) N X. Since X is forward-closed, we have 
Fwd(v) C X and thus SCC(v) € X. Moreover, Bwd(v) C Bwd(D) and thus 
SCC(v) € X. Hence SCC(v) € Bwd(D) n X. 


We now prove the complexity of PENDANT. 
Lemma 6. PENDANT runs in O()) sesccs(ay 9(5)) = O(n) symbolic steps. 


Proof. In each recursive call, PENDANT makes symbolic steps to (i) compute the 
SCCs of the picked pivots (Lines |5| and [6). and (ii) compute the backwards set 
of the outputted BSCC (Line|16). We will argue that, in total, case (i) takes 


2 sesccs(o) 99(5) time, while case (ii) takes | gesccs(c) 9(S) time. 


We start with case (i). For a given pivot v, computing SCC(v) is done in two 
steps: (a) Line [5] computes the forward set F of v restricted to the node set 
W, while (b) Line [6] computes SCC(v) as the backward set of v restricted to F. 
Clearly, (b) takes 90(SCC(v)) symbolic steps, thus summing over all pivots v, we 
have that SH at most 5 / sesccs(a) 9(5) time. To bound the time spent in 
(a), denote by Levels(v) the number of iterations executed in FWDLASTLAYER, 
i.e., PENDANT spends Levels(v) symbolic steps in Line |5| If F \ SCC(v) = 0 
or LV SCC(v) = 0, we have Levels(v) = 6(SCC(v)). Otherwise, the next pivot 
v' is PICK'ed from L (Line [10]. Consider a shortest-path P: v ~ v', and let 
(91,...S,) be the SCCs of nodes along P (except v), and note that Levels(v) € 
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pM ô( Si). Moreover, we have S; € Bwd(v’) for each i € (1,..., k}, and thus 
each S; is not touched again by FWDLASTLAYER, except if S; = SCC(v’), but 
this case is accounted for already. Summing over all such 5; across all pivots v, 
we have that 5^, Levels(v) < $ sesccs(c) 9(5). Hence the total symbolic time 


spent for case (i) is bounded by > /sescc.(q, 39(5). 


We now turn our attention to case (ii). Due to Lemma |3| W is forward closed 
and D is a BSCC. By Lemma [5] the set B computed in Line [16]is SCC-closed. 
The number of symbolic steps is hence bounded by >> SeSCCs(B) 6(S). Finally, B 
is removed from the graph in the recursive call, hence it will not be processed 
again. Thus the total time for case (ii) is » /sescc.(c, 9(5). 


4 Deadlock Detection 


We now outline a simple but effective preprocessing technique for BSCCs. 


Recall that a deadlock is a node v without outgoing edges, i.e., Post(v) = 0. 
Observe that all deadlocks are BSCCs: formally we have Fwd({v}) = {v} = 
SCC(v), and thus the statement follows from Proposition |1|(the opposite is, of 
course, not true in general). Thus, deadlock-detection can be seen as a natural 
preprocessing step to any BSCC algorithm. 


The key observation in this approach is that the set of all deadlocks can be 
computed efficiently, in only one symbolic step; this is achieved by Algorithm [5] 
In particular, the deadlock set is computed as D = V \ H where H is the set 
of nodes u that have a successor. In turn, H can be computed by a single Pre 
operation on the entire node set. Finally, due to Proposition [2| the set Bwd(D) 
is guaranteed to contain no BSCCs other than those in D, and thus it can be 
removed. The resulting graph is then passed to the main BSCC algorithm. 


Algorithm 5: Deadlock detection (preprocessing) 
Input: A graph G — (V, E) 


1 H = Pre(V,G) // Compute all nodes that have a successor 
2D=V\H // Compute all deadlocks 
3 B = BWD(D,G) // Compute safe-to-remove nodes 
4 output each node in D // Output BSCCs 


5 return G[V \ B] // Return remaining graph for further computation 


5 Experiments 


Here we report on an implementation of PENDANT, including the deadlock- 
detection technique, and an experimental evaluation of its performance on a 
large dataset of standard model-checking benchmarks across various domains. 
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Baselines. To assess the performance of PENDANT and deadlock detection, we 
compare it with BwpFwp (Algorithm B), as well as the recently introduced 
interleaved transition guided reduction (IT'GR) [6], which we have implemented 
in our setting. ITGR is applicable when the transition relation is partitioned into 
a number of smaller relations E = (Rı,..., Ry) (as is the case in our setup), 
and works as a preprocessing step, much like our deadlock detection. At a high 
level, ITGR employs some local reasoning for each relation R; to identify sets of 
nodes that do not contain BSCCs. Such sets can be removed, reducing the size 
of the graph that is further processed by a BSCC-computation algorithm. 


Research Questions. Our setup is centered around the following questions. 


RQ1 How does the performance of PENDANT compare to that of BWDF WD? 

RQ2 How does deadlock detection impact the performance of PENDANT and 
BwpDF wp? 

RQ3 How does ITGR impact the performance of PENDANT and BwDF wp? 

RQ4 How does the performance of PENDANT compare to the performance of 
BwDFwD when both use deadlock detection? 

RQ5 How does ITGR impact the performance of PENDANT after deadlock de- 
tection? 


Datasets. We use benchmarks from the following categories. 


— Petri Net models from MCC, the Model Checking Contest [22]. 
— DiVinE models from BEEM, the Benchmark of Explicit Models [27]. 
— Asynchronous Boolean Network models [3T1]2]. 


We do not apply any selection criteria, except discarding models that are too slow 
to handle by all algorithms in our timed experiments. This results in 553 models 
in total[!] In each model, the edge relation is naturally partitioned into subre- 
lations R,,..., Rx, following the structure of the high-level specifications (tran- 
sitions in Petri Nets and DiVinE state machines, and reactions in the Boolean 
Networks). We use the language-independent model checker LTSmin [20] to gen- 
erate symbolic graphs for the DVE and PNML models. Since LTSmin does not 
handle Boolean Networks, these graphs are generated by a custom parser. The 
time taken for the graph generation is not measured in the running time of each 
algorithm. We use the BDD package Sylvan as our symbolic representation [15]. 


Experimental setup. Our experiments are run on a Linux machine with 
2.4GHz CPU speed and 60GB of memory. We measure both symbolic steps 
and run time, but only present the results on symbolic steps here, as they reflect 
the true symbolic time-complexity of the algorithms, and are independent of the 
choice of the underlying BDD package. The results on time are qualitatively the 
same. Each run is timed out after 400 seconds, indicated as the graph taking 
10? symbolic steps on the figures. Since our input relation is partitioned into 


T Tool and data set available at https: //doi.org/10.5281/zenodo.10427894 
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several sub-relations E = (R,..., Ry), each Pre/Post operation incurs k sym- 
bolic steps (for all algorithms). Our setup is completely deterministic, however 
certain operations, like PICK’ing a node, are executed arbitrarily. 


Experimental results. We now present our experimental results for addressing 
the above research questions. Note that all figures are plotted in log-scale. 


Symbolic Steps 


eDVE 
¢PNML 
m Bool Net 


1 
10. 10? 10? 10^ 10? 108 
BwDF wpb 


10 


10" 108 10° 


Fig.2: The number of symbolic steps executed by PENDANT and BwDF WD. 


RQ1: PENDANT vs BWDFWD. The performance of PENDANT and BwDF wp 
is shown in Fig. |2| across all three datasets. Both algorithms manage to han- 
dle many models within the time limit, though there are a few time outs. We 
see that PENDANT is generally no slower than BWDF WD, with the clear ex- 
ception of three timeout outliers. For the rest, the models that are slower for 
PENDANT sit only slightly above the x = y line, meaning that the slowdown 
is comparatively small. On the other hand, there are several models on which 
PENDANT is generally faster than BWDFWD, and the speedup increases as we 
go towards more demanding benchmarks (more than two orders of magnitude). 
Finally, PENDANT times out on much fewer models than BWDF wD. Overall, 
PENDANT is measurably faster than BWDF WD, and this trend persists across all 
three datasets (DVE, PNML and Boolean Networks). 


RQ2: The impact of deadlock detection. The impact of deadlock detection to 
both PENDANT and BWDF' WD is shown in Fig. B] We see that deadlock detection 
improves the performance of both algorithms significantly. Indeed, detecting 


Fast Symbolic Computation of Bottom SCCs 123 


Symbolic Steps Symbolic Steps 


@ Deadlock 
$ No Deadlock 


PENDANT + deadlock 
BwDFwD + deadlock 


10? 107 10° 108 
BWwWDFWD 


10? 104 10° 10° 
PENDANT 


Fig. 3: The impact of deadlock detection in the number of symbolic steps ex- 
ecuted by PENDANT (left) and BwDFwn (right). Data points are classified as 
those having at least one deadlock, and those having no deadlock. 


deadlocks requires only one symbolic step (per relation R;), hence it is natural 
to expect that it does not slow down any algorithm, and has no effect on models 
that have no deadlocks. On the other hand, it leads to a measurable speedup on 
the models that have deadlocks, and the impact varies depending on the fraction 
of the graph that is removed during deadlock removal. Interestingly, deadlock 
detection also reduces significantly the number of timeouts for both PENDANT 
and BWDF wp. In conclusion, deadlock detection helps both algorithms. 


RQ3: The impact of ITGR. The impact of ITGR to both PENDANT and 
BWDFwWD is shown in Fig. Perhaps surprisingly, we find that ITGR does 
not have a consistent effect: it can both speed up and slow down each of the 
algorithms. At closer inspection, we observe that ITGR has a positive effect on 
most Boolean Network models, which is indeed the context in which it was in- 
troduced [6]. On the other hand, it has both positive and negative effects on 
DVE and PNML models, and even makes both algorithms time out on instances 
that they could easily handle without ITGR. 


RQ4: PENDANT vs BWDF WD, with deadlock detection. Since deadlock detection 
has a clear positive effect on both algorithms, it is natural to revisit RQ1 and ask 
about the performance of the two algorithms when also using deadlock detection. 
The result is shown in Fig. |5| Deadlock detection makes the performance of 
the two algorithms more similar in many benchmarks (i.e., more data points 
lie closer on the x = y line). However, PENDANT remains decisively faster on 
many models, and thus its benefit is not overshadowed by the positive impact of 
deadlock detection. At closer inspection, we see that PENDANT is faster on DVE 
and PNML models, but not on Boolean Networks. This is due to the fact that 
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Symbolic Steps Symbolic Steps 
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PENDANT + itgr 
BwDFwn + itgr 


104 10° 10° 10° 10° 
PENDANT BWDFWD 


Fig.4: The impact of ITGR in the number of symbolic steps executed by 
PENDANT (left) and BWDFw» (right). 
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Fig.5: The number of symbolic steps executed by PENDANT and BWDF wp, 
when also using deadlock detection. 


Fast Symbolic Computation of Bottom SCCs 125 


most Boolean Networks have many deadlocks, and thus the common deadlock- 
detection component simplifies such models considerably, making the remaining 
performance of the two algorithms similar. 


Symbolic Steps 
10°F 


108 


F m Bool Net 


10” 


PENDANT + deadlock 
S 


10^ 105 109 107 105 10? 
PENDANT + deadlock + itgr 


Fig.6: The impact of ITGR after using deadlock detection. 


RQS5: The impact of ITGR after deadlock detection. Finally, in Fig. [6]we examine 
whether ITGR improves the performance of PENDANT after deadlock detection 
has run. Although ITGR improves the performance on a few models, it gener- 
ally leads to a slowdown, as well as to more timeouts. Interestingly, ITGR has 
the fewest positive effects (on top of deadlock detection) for Boolean Network 
models, for which it was originally introduced. Since these models have several 
deadlocks, the fast deadlock-detection preprocessing simplifies them consider- 
ably, at which point the cost of ITGR is not worth its little (or no) impact. 


6 Conclusion 


We have introduced PENDANT, a new symbolic algorithm for computing BSCCs, 
as well as a deadlock-detection technique for this task. Though both PENDANT 
and the standard BWDFwD have O(n) symbolic-time complexity, our experi- 
mental results show that PENDANT is typically faster, and thus to be preferred 
for this task. Moreover, deadlock-detection is an efficient and effective prepro- 
cessing technique for reporting singleton BSCCs (and removing their basin), 
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before handing the computation to the general algorithm. Finally, the recently 
introduced ITGR, although effective on Boolean Network models, has mixed 
effects on DVE and PNML models, while its effect is often negative after dead- 
lock detection (but not always). Some opportunities for future research include 
introducing saturation techniques [34] to PENDANT, extending the algorithm to 
symbolically handle colored graphs , and understanding better the settings 
in which ITGR is effective. 
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Abstract. Formal verification is essential but challenging: Even the best 
verifiers may produce wrong verification verdicts. Certifying verifiers en- 
hance the confidence in verification results by generating a witness for 
other tools to validate the verdict independently. Recently, translating 
the hardware-modeling language BTOR2 to software, such as the pro- 
gramming language C or LLVM intermediate representation, has been 
actively studied and facilitated verifying hardware designs by software 
analyzers. However, it remained unknown whether witnesses produced by 
software verifiers contain helpful information about the original circuits 
and how such information can aid hardware analysis. We propose a certi- 
fying and validating framework Bror2-Cerr to verify safety properties of 
BTOR2 circuits, combining BTOR2-to-C translation, software verifiers, and 
a new witness validator Bron2-Var, to answer the above open questions. 
Bron2-Cznr translates a software violation witness to a BTOR2 violation 
witness; As the BTOR2 language lacks a format for correctness witnesses, 
we encode invariants in software correctness witnesses as B TOR2 circuits. 
'The validator Bron2-Var checks violation witnesses by circuit simulation 
and correctness witnesses by validation via verification. In our evaluation, 
Bror2-Cerr successfully utilized software witnesses to improve quality as- 
surance of hardware. By invoking the software verifier Csmc on translated 
programs, it uniquely solved, with confirmed witnesses, 8 96 of the unsafe 
tasks for which the hardware verifier ABC failed to detect bugs. 


Keywords: Hardware verification - Software verification - Verification 
witnesses - Witness validation - Word-level circuit - BroR2 - SMT - SAT 


1 Introduction 


Certifying algorithms |1| generate a certificate alongside the computed solution 
such that proof checkers can independently validate the solution to increase users' 
trust and the explainability of the results. In the model-checking community, 
a certificate to explain a verdict for a verification task is called a witness [2], 
and verifiers able to generate witnesses are called certifying verifiers. Witnesses 
can be independently checked by witness validators to confirm the verification 
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Fig. 1: A certifying hardware-verification framework using software analyzers 


results. Figure 1a shows a generic workflow for certifying and validating model 
checking. After a certifying verifier produces a verdict v and a witness w on a 
task T', a witness validator takes T and w as input and checks if the information 
in w is enough to reestablish the results of the verifier on T. The outcome of the 
verifier is certified if its verdict v and the validator’s verdict v’ are consistent. 
In the rest of the paper, we use certifying model checking interchangeably with 
certifying and validating model checking when it is clear from the context that 
a framework contains both a certifying verifier and a witness validator. For 
reachability properties, if a model violates a safety specification, a violation 
witness [3] may contain external inputs to the model to replay the erroneous 
execution trace. If the safety specification is satisfied, a correctness witness |4] 
could record invariants of the model to reconstruct a safety proof. Section 2 
presents a brief survey on witness validation in the formal-methods community. 

Recently, hardware-to-software translators [5, 6| from the hardware-modeling 
language BTOR2 [7], a prevailing format for word-level hardware model checking 
used in the Hardware Model Checking Competitions (HWMCC) [8, 9], have 
been proposed to facilitate the application of software analyzers to hardware 
circuits. Tools Bror2C [5] and Bror2MLIR [6] translate BTOR2 circuits to 
behaviorally equivalent imperative software in the programming language C [10] 
and the intermediate representation used by the compilation toolchain LLVM [11], 
respectively, and enable any software analyzer for C or LLVM-bytecode programs 
to inspect BTOR2 circuits. In an experiment on more than 1000 BTOR2 circuits [5], 
software verifiers for C programs are shown to detect more bugs than the best 
hardware model checkers by preprocessing the original circuit with BroR2C 
and analyzing the translated C program. However, in this previous work [5], 
only the verdicts of software verifiers but not the witnesses, which contain the 
information and reasoning behind a verdict, are transferred back to the hardware 
domain. In other words, the results of software verifiers on BTOR2 circuits are 
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not certified, and hence hardware designers may not trust software verifiers for 
analyzing their circuits. 


1.1 Our Motivation and Contributions 


Motivated to mitigate the aforementioned threat to reliability and leverage the 
capability of software verifiers to generate witnesses, we investigate the follow- 
ing open questions in this work: (1) whether the software witnesses for trans- 
lated programs contain useful information about original circuits and (2) how 
to employ the information to aid hardware quality assurance. Our contributions 
are summarized below. 


A Certifying Framework for HW Verification with SW Analyzers. 
Figure 1b shows the proposed certifying and validating hardware-verification 
framework based on software analyzers to approach the open questions. The 
framework translates a hardware-verification task Ty to a software task Ts and 
applies software verifiers to Ts. After obtaining a software witness wg, it encodes 
relevant information from wg in the form of a hardware witness wg; and validates 
the verdict returned by software verifiers with wy. We instantiate the framework in 
a tool BroR2-Cznr for verifying BTOR2 circuits with certified verdicts. In addition 
to preprocessing BTOR2 circuits with Bror2C [5] and invoking model checkers for 
the translated C programs, such as CPACHECKER [12], Camc [13], EsBwc [14], and 
UAvTOMIZER [15], Bror2-Cerr features a translator from software witnesses to 
BTOR2 witnesses and a witness validator Bron2-Var to check BTOR2 witnesses. 
Section 4 shows our tool architecture. 

Note that the framework in Fig. 1b is not limited to Bror2C and verifiers for C 
programs. For example, one could also materialize the concept with the translator 
Bror2MLIR [6], analyzers for LLVM-bytecode programs [11], such as Kree [16], 
Smack [17], and SEAHonw [18], and a corresponding LLVM-to-BTOR2 witness 
translator. There also exist translators [19, 20,21] from Verilog [22] circuits to C 
programs or SMV [23] models. We choose Bror2C for task translation because 
many verifiers for C programs participating in the International Competitions on 
Software Verification (SV-COMP) [24] can generate witnesses in a standardized 
and exchangeable format [2]. 


A Translator from Software Witnesses to BTOR2 Witnesses. BroR2-CERT 
translates software violation witnesses in the format used in SV-COMP [24] to the 
format defined by the BTOR2 language [7]. For tasks satisfying their specifications, 
as there is no native format for correctness witnesses in BTOR2, BroR2-CERT 
extracts the invariants in software witnesses and represents them as BTOR2 circuits, 
whose inputs refer to the state variables of the original circuit. The advantages of 
not inventing a new format but reusing the existing BTOR2 language are twofold: 
First, BTOR2 extends SMT-LIB 2 [25] and provides the required operations on 
the word level to accommodate most invariants derived by software verifiers. 
Second, BTOR2 is supported by many hardware model checkers participating in 
HWMCC [8, 9] and offers a suite Bror2Toots [26] of utility tools for parsing and 
simulation, which simplifies further development around the BTOR2 format. 
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A Validator for BTOR2 Witnesses. To validate the witnesses for BTOR2 
circuits, we develop Bror2-VAL, a portfolio-based witness validator involving 
hardware simulators and verifiers. BroR2-Var validates violation witnesses by 
invoking the simulator BrorSim from Bror2Toots [26]. For correctness witnesses, 
Bror2-Vat follows the validation-via-verification approach [27] by instrumenting 
the original BTOR2 circuit with the circuit representing the invariant and verifying 
the instrumented circuit. The instrumented circuit satisfies the modified safety 
property if the invariant can be used to reconstruct the proof of correctness. 
Hardware verifiers are employed to check the instrumented circuits. BTOR2-VAL 
leverages COVERITEAM [28], a framework for cooperative verification, to coordinate 
the underlying hardware simulators and verifiers. 


Enhancing Confidence in SW Verifiers on HW Designs. We evaluate 
Bror2-Cert on more than 1 000 BTOR2 circuits to study its capability of providing 
certified verification results using software analyzers. In the experiment, 


e the witness translator was able to translate every violation witness and 97 96 
of the correctness witnesses produced by software verifiers, 

e the combination of witness translation and Bror2-VAL outperformed mature 
software witness validators in both effectiveness and efficiency, and 

e BTOR2-CERT provided certified results computed by software verifiers on 
some BTOR2 circuits that the best hardware model checkers failed to verify. 


The conceptual message conveyed by BTOR2-CERT is software analyzers can 
derive useful information about circuits and complement comventional hardware 
model checkers with trustworthy results. Our contributions have a positive impact 
on analyzing hardware designs with software verifiers. The proposed framework 
Bror2-Cert is open-source and available online (more information in Sect. 4). 


2 Related Work 


Generating and validating witnesses for analysis results have been studied through- 
out the entire verification toolchain from satisfiability solvers to model checkers. 
In the following, we briefly review witness validation and compare our work to a 
recent certifying verification framework [29, 30, 31] targeting k-induction [32]. 


2.1 Witness Validation 


For satisfiability solving, the competitions on propositional SAT solvers [33, 34] 
use the DRAT format [35] to encode the certificates of unsatisfiability and inde- 
pendent validators [36, 37] to check the proofs. The competitions on SMT solving 
verify models to satisfiable formulas with the tool Dormen [38]. Certifications for 
quantified Boolean formulas have also been investigated [39, 40]. 

For model checking, an early work [41] suggests generating a deductive proof 
from the run of model checkers with extra bookkeeping steps. In HWMCC [8, 9], 
the BTOR2 [7] language defines a format for violation witnesses as a sequence of 
input values and initial values for registers that lead to an erroneous execution. 
However, BTOR2 has no format for correctness witnesses. The competitions on 
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automated termination analysis [42] use the format CPF [43], and in SV-COMP [24], 
a GraphML-based format [2] is used to describe software witnesses as automata. 
In addition to the properties commonly used in tool competitions, a recent work 
extends proof generation of model checking to full LTL properties [44]. 
Numerous approaches have been invented for validating software witnesses. 
Methods to validate correctness witnesses include a parallel extension [45] of 
k-induction, program instrumentation with invariants and re-verification [27] 
(referred to as validation via verification in the publication), and program de- 
composition into several straight-line sub-programs [46]. Execution-based vali- 
dation [47] is an elegant approach to validate violation witnesses. It extracts a 
sequence of external input values from a violation witness and employs debuggers 
or simulators to testify the reachability of an error location. Our witness validator 
BTOR2-VAL leverages validation via verification and execution-based validation. 
More details are given in Sect. 5 and Sect. 6, respectively. In our evaluation, the 
proposed validator BToR2-Var (together with the witness translator) competed 
well against the winners in the witness-validation track of SV-COMP 2023 [24]. 


2.2  Validating k-Inductiveness of Properties in Hardware Models 


Given a sequential circuit and a number k as input, the tool CEnrIFAIGER [29, 30] 
aims to validate that the safety property of the input circuit is k-inductive. 
Composing a k-induction-based hardware model checker and CERTIFAIGER yields a 
certifying and validating model checker (as depicted in Fig. la), whose witnesses 
are the inductive length k. The key differences between the proposed framework 
in Fig. 1b and this framework [29, 30] for k-inductiveness are as follows. 

First, our validator Bror2-VAL expects a candidate invariant in the correctness 
witness but does not restrict the algorithms used by software verifiers. In contrast, 
CERTIFAIGER expects a candidate inductive length k and thus can only validate 
results of k-induction-based model checkers. Second, to validate witnesses, Bror2- 
Var relies on validation via verification [27] and invokes model checkers because 
the candidate invariant may not be inductive. In comparison, CERTIFAIGER avoids 
model checking and reduces the validation problem to several SAT checks since 
it assumes the safety property to be k-inductive. To sum up, our framework 
complements the existing work [29,30] by considering candidate invariants as 
witnesses. Its applicability to algorithms other than k-induction comes at the 
expense of potentially more complex validation procedure. CERTIFAIGER is further 
extended to accommodate temporal decomposition [48] as preprocessing to simplify 
the verification tasks [31], which has not yet been considered in our framework 
and is an important direction of future work. 


3 Background 


To facilitate the discussion in the rest of this manuscript, we provide prerequisite 
knowledge on model checking and witness validation from the literature. 

A state-transition system M is described by two predicates I(s) and TR(s, s") 
over states s and s' of M, which encode the initial states and transition relation 
(TR(s,s’) is true if s can transit to s' via one step) of M, respectively. An 
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1 sort bitvec 8 
2 sort bitvec 1 
3 constd 1 42 
4 constd 1 2 
5 zero 1 1 extern void abort (void) ; 
6 state 1; a 2 extern unsigned char nondet uchar(); 
7 state 1 ; b 3 void main() ( 
8 input 1 ; in 4 typedef unsigned char SORT 1; 
9 init 164; a init to 2 5 SORT 1 a - nondet uchar(); 
10 init 1 T 5; b init Eo U 6 SORT 1 b - nondet uchar(); 
lleq265; a==0 7 a= 2; 
12 eq 2 7 4 ; b = 2 8 b = 0; 
13 eq 2 8 3; in == 42 9 for (;;) { 
14 and 2 11 12 10 SORT 1 in = nondet uchar(); 
15 and 2 13 14 1l if (a == 0 && b == 2 && in == 42) { 
16 bad 15 12 ERROR: abort(); 
17 one 1 13 } 
I8 sel l 6 17 14 a = a >> 1; 
19 xor 1 7 17 15 b=b%* 1; 
20 next 1 6 18 16 } 
21 next 1 7 19 17 } 
(a) BTOR2 circuit (b) C program (simplified for demo) 


Fig.2: An example BTOR2 circuit and its translated C program 


invariant Inv(s) of a system M is a predicate over states of M such that Inv(s) 
is true for every reachable state s of M. We denote “Inv is an invariant of M” 
by M H Inv. A safety-verification task consists of a state-transition system M 
and a safety property P(s). We say a safety-verification task (or a verification 
task for short) is safe if M | P and unsafe otherwise. Given a verification task 
of M and P, the problem of model checking asks whether M E- P or not. In 
practice, state-transition systems manifest themselves as sequential digital circuits 
or programs. In the following, we briefly introduce the modeling languages used 
in HWMCO [8,9] and SV-COMP [24] with a running example. 


3.1 The BTor2 Language for Word-Level Circuits 


The BTOR2 hardware-modeling language [7] was invented to describe model- 
checking problems of word-level sequential circuits. It extends the bit-level AIGER 
format [49] with data sorts of bit-vectors and arrays and inherits word-level 
operations from SMT-LIB 2 [25]. Figure 2a shows an example BTOR2 circuit. 
The circuit has two state variables a and b and an external input in, defined 
in lines 6-8, respectively. The states and input are bit-vectors of width 8 (the 
sort bitvec 8 defined in line 1). Variables a and b are initialized to 2 and 0, 
respectively. In each iteration, variable a is right-shifted by 1 bit (line 18), and 
variable b is bitwise XOR-ed with 1 (line 19). Indicated by the keyword bad in 
line 16, a property violation happens if variable a equals 0, variable b equals 2, 
and input in equals 42. The example BTOR2 circuit satisfies its safety property 
because variable b never equals 2. However, if variable b is initialized to a different 
value at line 10 (marked in red), say 2, a property violation will be triggered after 
two steps of state transition if 42 is given as the external input in the last iteration. 


Translating BTOR2 Circuits to C Programs. Bror2C [5] is a lightweight 
translator from the BTOR2 language to the programming language C [10]. It 
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encodes BTOR2 data sorts with unsigned integers and static arrays, expresses 
BTOR2 operations with corresponding operators of C, and uses an infinite loop 
to model the execution of a sequential circuit. Given the example BTOR2 circuit 
in Fig. 2a as input, BroR2C generates a translated C program! shown in Fig. 2b. 
Bror2C follows the rules of SV-COMP [24] to encode safety-verification tasks 
for C programs, so compositional hardware model checkers for BTOR2 circuits can 
be readily formed by combining software verifiers participating in SV-COMP as 
verification engines and Bror2C as preprocessing. In an extensive experiment [5], 
software verifiers are shown to detect more bugs in BTOR2 circuits than the best 
conventional hardware model checkers, such as ABC [50] and AVR [51]. 


3.2 Representing Software Witnesses as Automata 


Software witnesses can be represented as protocol automata [2], describing program 
invariants needed to construct a safety proof or program paths leading to a property 
violation. A letter in the alphabet of such a protocol automaton is a pair of a set 
of program edges and a condition over program variables. The set of program 
edges indicates the control flow, and the condition can be used to restrict the 
state space of the program. Program invariants that should hold at a certain 
program location can be annotated to a protocol automaton. In the following, we 
give an example correctness witness for the C program in Fig. 2b and an example 
violation witness for the same C program but with line 8 commented out. 


Correctness Witnesses. Figure 3 shows an exam- EA © AS 
ple correctness witness for the C program in Fig. 2b. a 
The correctness witness shows that a program in- j 
variant b>=0 && b<=1 is established once line 8 is a © d 


executed. Indeed, variable b switches between 0 

and 1 after being initialized, and b»-0 && b«-1is Pig X A correctness witness 
an invariant at the loop head of the program. A program invariant is stored as 
a C expression in a software witness and hence potentially more compact than 
invariants represented in other formalisms, e.g., a bit-level AIGER [49] circuit. 


Violation Witnesses. Figure 4 shows an example re zo adi 
violation witness for the modified C program with i 


variable b uninitialized (by commenting out line 8 pes 
in Fig. 2b). The violation witness shows how to a ofu 
reach the error in line 12 of the C program. First, 10: T 

it assumes the value of variable b to be 2 via the Oz o/w 
condition when line 6 is executed. Second, it goes to 10: T 

the next state when line 10 is executed for the first Oz ae 
two times. Third, it assumes the external input 
to be 42 when line 10 is executed for the third — 
time. Indeed, the error in line 12 can be reached if 
variable b gets an initial value of 2 and the external 

input equals 42 in the third loop iteration. Fig. 4: A violation witness 


1 The intermediate variables in the actual output program of Bror2C are omitted. 
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Invariant 
Quality 


C Program 


Bror2-Vat for Correctness Witnesses 


: BTOR2 i Circuit Instrumented : 
> m SPEE 5 RN HW Verifier > Verdict 
i Witness Circuit i| Instrumentor Circuit(s) E: ere 


aera ner | A 


Witness Translation 


Invariant 
Extractor 


Correctness 
Witness 


Bror2 Circuit 


(a) Validating correctness witnesses by circuit instrumentation and verification 


Witness Translation Bror2-Vat for Violation Witnesses 


i| Input/State Value | : Bron2 


B HW Simulator : Verdict 
: Extractor TN. Violation Witness 77] (Bronsw) f; ee 


(b) Validating violation witnesses by circuit simulation 


BroR2 Circuit 


Fig. 5: Witness translation and validation in BTOR2-CERT and Bror2-VAL 


4 Architecture of Bror2-CEertT and BronR2-VAL 


We instantiate the proposed certifying and validating hardware-verification frame- 
work in Fig. 1b as Bror2-Cerr? with the Bror2-to-C translator Bror2C [5], 
model checkers for C programs [52] that can produce verification witnesses in the 
format discussed in Sect. 3, a C-to-BTOR2 witness translator, and the witness 
validator BroR2-Var. Figure 5 shows the translation and validation flows for 
correctness (in Fig. 5a) and violation witnesses (in Fig. 5b). Both the translator 
and the validator Bron2-Var for BTOR2 witnesses are implemented in Python 3. 
BToR2-Var is based on a portfolio of hardware verifiers and simulators, with differ- 
ent tools coordinated by the cooperative-verification framework CoVeERITEAM [28]. 


4.1 Validating Correctness Witnesses 


Given a safe BTOR2 circuit, its translated C program, and a correctness witness 
produced by some software verifier, Bron2-Cznr certifies the results of the software 
verifier in two steps, as depicted in Fig. 5a. In the first step of witness translation, 
BToR2-CERT extracts the invariant at the loop head of the C program and 
represents it as a BTOR2 circuit. The BTOR2 circuit is named a witness circuit 
and refers to the state variables of the original circuit from its primary inputs. 
Second, in the validation step, Bror2-VAL takes as input the original circuit, the 
witness circuit, and a user-defined parameter called invariant quality that specifies 
the level of strictness imposed on the invariant. BroR2-Var offers three levels 
of invariant quality to users, based on which it instruments the original circuit. 
Hardware verifiers are invoked on the instrumented circuit and will deem it safe 
if the invariant meets the specified invariant quality for reconstructing a safety 
proof. The details of validating correctness witnesses are presented in Sect. 5. 


? https://gitlab.com/sosy-lab/software/btor2-cert 
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4.2 Validating Violation Witnesses 


Given an unsafe BTOR2 circuit, its translated C program, and a violation witness 
produced by some software verifier, Bror2-Crrt certifies the results of the software 
verifier in two steps, as depicted in Fig. 5b. In the first step of witness translation, 
Bror2-Cert extracts the values for external inputs and uninitialized states from 
the software violation witness and encodes the information as a BTOR2 violation 
witness [7]. Second, in the validation step, Bron2-Var invokes BrorSim [26], a 
simulator for BTOR2 circuits, to decide whether the BTOR2 violation witness can 
trigger a bug in the original circuit. The details of validating violation witnesses 
are presented in Sect. 6. 


5 Certifying Results of Software Verifiers: Correctness 


In this section, we describe how Bror2-Cert certifies verification results for safe 
verification tasks. The BTOR2 circuit and its translated C program in Fig. 2 
as well as the software correctness witness in Fig. 3 will be used to explain the 
translation and validation of correctness witnesses, as outlined in Fig. 5a. 


5.1 Witness Translation 


Given a software correctness witness with a predicate annotated at the loop 
head of the translated C program, which some software verifier claims to be an 
invariant,? Bror2-Cerr considers the predicate as a candidate invariant for the 
original BTOR2 circuit and extracts it to reconstruct a 
safety proof. We encode the candidate invariant, written 
as an expression in the programming language C, into a 
combinational BTOR2 circuit whose inputs refer to the 
state variables of the original BTOR2 circuit and unique 
output asserts the predicate. Translating C expressions 
into BTOR2 circuits is feasible thanks to the word-level 
data sorts and operations in the BTOR2 language [7]. 
We name the combinational BTOR2 circuit a witness 
circuit and refer to it as a BTOR2 correctness witness. Fig.6: A witness circuit 
Note that our notion of a witness circuit is different 

from CERTIFAIGER’s definition of a k-witness circuit [29], which is a sequential 
circuit simulating k-step execution of the original circuit in one step. Figure 6 
shows the witness circuit generated from the software correctness witness in Fig. 3. 
The input defined in line 5 refers to state variable b of the BTOR2 circuit in Fig. 2a. 
The output defined in line 9 asserts the candidate invariant b >= 0 && b <= 1. 


sort bitvec 8 

sort bitvec 1 

zero 1 

one 1 

input 1 ; state "b" 
ugte 253; b >= 0 
ulte 254; b <= 1 
and 2 6 7 

output 8 


vo 0-J1oUsu0u NH 


5.2 Witness Validation via Verification 


Following the idea of validation via verification [27], the validator BroR2-VAr in 
Bror2-Cerr checks BTOR2 correctness witnesses by instrumenting the original 
circuit with the witness circuit and invoking hardware model checkers. It distin- 
guishes three levels of quality for a candidate invariant computed by software 


3 Many mature verifiers in SV-COMP derive invariants at loop-head locations. 
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Table 1: Candidate invariants at the loop head of the program in Fig. 2b 


Predicate Quality Reason 

il not invariant M Inv fails. 

T invariant but unsafe Inv => P fails. 

b!=2 safe invariant but not inductive Inv(s) ^ TR(s, s') 2 Inv(s’) fails. 
b»-0 && b«-1 safe and inductive invariant All checks succeed. 


verifiers. According to the notation introduced in Sect. 3, we denote the state- 
transition system of the original BTOR2 circuit by M, with initial states I(s), a 
transition relation T'R(s, s’), and a safety property P(s). A predicate Inv(s) is 


e an invariant if M H= Inv, 

e a safe invariant if M = Inv and Inv(s) > P(s), and 

e a safe and inductive invariant if (1) Inv(s) > P(s), (2) I(s) > Inv(s), and 
(3) Inv(s) ^ T R(s, s') > Inv(s!). 


In the literature [29], the three conditions for safe and inductive invariants are 
also named consistency, initiation, and consecution, respectively. Table 1 shows 
four predicates and highlights their respective quality as an invariant at the loop 
head of the program in Fig. 2b (P is the negated error condition). 

Bror2-Vau takes the original BTOR2 circuit, the witness circuit, and a user- 
specified invariant quality for the correctness witness as input and instruments the 
original circuit accordingly. To check if Inv(s) is an invariant helpful to reestablish 
a proof of P, BronR2-Var combines the witness circuit and the original circuit by 
connecting the state variables of the original circuit to the corresponding inputs of 
the witness circuit. That is, BroR2-Var builds a circuit that encodes M  Inv^ P. 
'The instrumented circuit is given to hardware model checkers, which will utilize 
the information provided by the witness circuit to find a proof of correctness or 
refute the predicate if it is not an invariant. Note that the verification time of the 
instrumented circuit is expected to be shorter than that of the original circuit 
because the predicate can guide the search of hardware model checkers. 

To implement the consistency, initiation, and consecution checks for safe 
or inductive invariants, Bror2-VAL also relies on circuit instrumentation and 
hardware model checkers. While the three checks are not model checking but 
satisfiability in essence, it is convenient to encode them as combinational BTOR2 
circuits. Moreover, some hardware model checkers, such as ABC [50], can simplify 
the circuits before performing satisfiability solving, which is usually faster than 
solving the queries directly with satisfiability solvers. 


6 Certifying Results of Software Verifiers: Violation 


In this section, we describe how Bror2-Cerrt certifies verification results for 
unsafe verification tasks. The unsafe versions of the BTOR2 circuit and its 
translated C program in Fig. 2 with the state variable b being uninitialized 
(namely, with line 10 in Fig. 2a and line 8 in Fig. 2b commented out) as well as 
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the software violation witness in Fig. 4 will be used to explain the translation 
and validation of violation witnesses, as outlined in Fig. 5b. 

The BTOR2 language defines a format for violation witnesses [7]. A BTor2 
violation witness contains a sequence of input values fed to the BTOR2 circuit 
in each cycle and the initial values for uninitialized state variables. Figure 7 
shows an example violation witness for the unsafe version of the BTOR2 circuit 
in Fig. 2a. It demonstrates how to trigger the error specified by the Oth bad 
statement (indicted by bO) via giving the initial value 2 to the 1st state variable b 
(under #0; a is the Oth state variable) and 42 to the Oth input in in the 2nd 
cycle (indicated by 02). The simulator BrorSim [26] takes a BTOR2 circuit and 
a BTOR2 violation witness and executes the circuit with the values for inputs 
and states in the witness. It confirms the violation witness if an error is triggered. 
The violation witness in Fig. 7 does not specify input values in the first two 
cycles because they are irrelevant to the error. In this case, BronSiM will assume 
the unspecified values to be zero. 


6.1 Witness Translation 


Given a software violation witness of the translated C pro- sat 

gram, BTOR2-CERT extracts the conditions over program bO 

variables from the protocol automaton. These conditions #0 

are used by the software violation witness to prune out ; US proe 
irrelevant program paths and highlight an error path. e1 

Bror2-Cerr uses such information to give values to the 82 

corresponding BTOR2 inputs and state variables in the QUODLOLOL P. irc-42 


form of a BTOR2 violation witness. For example, the soft- 
ware violation witness in Fig. 4 will be translated to the Fig.7: A BTOR2 vio- 


BTOR2 violation witness in Fig. 7. lation witness 


6.2 Witness Validation via Execution 


Following the idea of execution-based witness validation [47], BroR2-Var checks 
BTOR2 violation witnesses by invoking the simulator BronS1M on the original 
BTOR2 circuit and the translated BTOR2 violation witness. Àn advantage of 
execution-based witness validation is its speed: In our evaluation, Bror2-VAL 
was able to validate BTOR2 violation witnesses translated from software vio- 
lation witnesses much faster than software verifiers for finding the bugs. The 
speed of Bron2-Var minimizes the overhead to validate the alarms reported by 
software verifiers and makes the results of software verifiers more trustworthy 
and transparent for hardware designers. 


7 Evaluation 


To address the open questions highlighted in Sect. 1.1, we evaluated the pro- 
posed certifying hardware-verification framework Bror2-CrERT on more than 1 000 
BTOR2 circuits and the witness validator Bror2-VAL prepended with witness 
translation against the top contenders in the witness-validation track of SV-COMP 
2023 [24]. Our experiment is designed to answer the following research questions: 
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e RQI: Can Bron2-Cznr translate software witnesses to BTOR2 witnesses? 

e RQ2: Is BroR2-Var prepended with witness translation effective compared 
to state-of-the-art software witness validators? 

e RQ3: Is BTOR2-VAL prepended with witness translation efficient compared 
to state-of-the-art software witness validators? 

e RQA: Is the run-time consumed by witness validators shorter than the run- 
time consumed by software verifiers? 

e RQ5: Can Bron2-CEnT complement conventional hardware model checking 
by providing additional certified verification results? 


7.1 Benchmark Set 


We executed our experiments on a benchmark set consisting of 1214 safety- 
verification tasks of BTOR2 circuits, among which 758 are safe and 456 are unsafe. 
The verification tasks are collected from HWMCC as well as other sources and were 
used to compare the performance of hardware and software model checkers [5]. 


7.2 Experimental Settings 


All experiments were conducted on machines running Ubuntu 22.04 (64 bit), 
each with a 3.4GHz CPU (Intel Xeon E3-1230 v5) with 8 processing units and 
33 GB of RAM. The resource limits imposed on verifying translated C programs 
and validating generated witnesses are both set to 2 CPU cores, 15 min of CPU 
time, and 15 GB of RAM. We used BENcHExzc [53] to ensure reliable resource 
measurement and reproducible results. BroR2-CERT uses Bror2C at commit 
36c1ad52 for translating a BTOR2 circuit to a C program. In our experiment, we 
configure the witness validator BroR2-Var to use the PDR [54] implementation 
in ABC [50] at commit 65ccd3cc and BrorSim [26] as the underlying hardware 
model checker and simulator, respectively.^ We also tried AVR [51] for validating 
correctness witnesses, but it encountered errors on many instrumented circuits 
even though the circuits are syntactically valid according to Bron2Toors [26]. 


7.3 Evaluated Verifiers and Validators 


To verify the translated C programs, we used CPACHECKER [12] at revision 44619 
and UAUTOMIZER [15] at commit 6£436663 on safe tasks because they are good 
at constructing invariants in the competitions. We configured CPACHECKER to 
run four algorithms based on Craig interpolation [55], including IMC [56,57], 
ISMC [58], Impact [59], and predicate abstraction [60]. On unsafe tasks, we evalu- 
ated the BMC [61] implementations in CPACHECKER, Cpmc [13], and EsBwc [14] 
because BMC is the prevailing technique for bug hunting. Both CBMc and EsBMc 
were downloaded from the archiving repository of SV-COMP 2023 [52]. For UAv- 
TOMIZER, we used its default settings in SV-COMP for both safe and unsafe tasks. 

'To evaluate BroR2-VAL, we prepended it with the witness-translation step and 
compared the combination, which takes software witnesses as input, to validators 
for software witnesses. For correctness witnesses, we evaluated the first place 


^ As ABC works on the bit level, we bit-blasted BTOR2 circuits into the AIGER format 
with Bron2AIGER [26] before invoking ABC. 
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winner UAvTOMIZER of the witness-validation track in SV-COMP 2023 [24]. We 
also used an emerging validator LIV [46] at commit c£736e45, which decomposes 
a program into straight-line sub-programs to check inductive invariants. We cannot 
compare Bror2-VAL to CERTIFAIGER [29,30] because CERTIFAIGER Consumes a 
candidate inductive length as input, while Bror2-VAL expects an invariant from 
the witnesses. For violation witnesses, we compared Bror2-VAL to execution-based 
validators [47] CPA-w2T and FSuELL-w2T. The former is of the same version 
as CPACHECKER (i.e., at revision 44619) and the latter was downloaded from 
the tool archive of SV-COMP 2023 [52]. We also evaluated MErAVar [27], a tool 
using validation via verification, but it did not terminate when instrumenting the 
translated C programs and failed to validate any witness in our experiment. 


7.4 Results 


RQ1: SW-to-HW Witness Translation. The upper part of Table 2 (resp. 
Table 3) shows the numbers of correctness (resp. violation) witnesses produced by 
the software verifiers and those successfully translated by the witness translator in 
Bror2-Cert. Table 2 additionally shows in its 2nd row the numbers of software 
witnesses with candidate invariants annotated to the loop head of a translated C 
program. About 97 96 of the candidate invariants in software correctness witnesses 
can be translated to BTOR2 witness circuits. The CPACHECKER’s 14 candidate 
invariants that cannot be translated were due to the C-expression parser? exceed- 
ing the time limit when constructing abstract syntax trees. This is a technical 
limitation orthogonal to the proposed approach. Furthermore, all 4 candidate in- 
variants of UAvTOMIZER that could not be translated refer to undeclared program 
variables, rendering the witnesses to be syntactically incorrect.? 

For software violation witnesses, all of them were successfully translated by 
BTOR2-CERT. The median translation time was below 2s for both correctness 
and violation witnesses. Moreover, measured by the number of lines of a BTOR2 
witness, the translated correctness witnesses have a median size of 321, and the 
violation witnesses have a median size of 308. The results show the feasibility to 
translate and represent the information found by software verifiers in a native 
hardware-modeling format. 


RQ2: Effectiveness of Bron2-Var. The lower part of Table 2 (resp. Table 3) 
summarizes the numbers of correctness (resp. violation) witnesses that were 
validated by Bron2-Var and the compared validators. 

Bron2-VaL was able to validate the correctness witnesses produced by both 
CPACHECKER and UAutomizer. When configured to accept safe and inductive 
invariants (recall the three levels of invariant quality in Sect. 5), it validates 329 
out of 576 correctness witnesses translated to BTOR2 witness circuits. In contrast, 
UAvTOMIZER, the winner of the witness-validation track in SV-COMP 2023 [24], 
was not able to validate any correctness witness produced by CPACHECKER (the 
corresponding cells are marked as “-”). LIV is designed to confirm safe and 
inductive invariants [46] and accepted 305 correctness witnesses in total, similar 


? Bror2-Cerr uses pycparser 2.21 (https: //github.com/eliben/pycparser). 
6 https://github.com/ultimate-pa/ultimate/issues/660 
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Table 2: Summary of results on validating correctness witnesses 


Verif. CPACHECKER Sum of each analysis 
Val. IMC ISMC nier PredAbs | UA TOWER accepted rejected others 
(proofs) 119 85 155 182 79 620 - - 
w/ candidate inv. | 114 79 148 178 75 594 - - 
translated 113 79 139 174 71 576 - - 
2 invariant 77 66 iar 119 67 446 105 69 
7 safe 27 47 90 118 45 327 228 65 
& safe&inductive | 28 47 90 118 46 329 243 48 
LIV 15 32 95 122 41 305 252 63 
UAUTOMIZER B - B = 74 74 2 3 


Table 3: Summary of results on validating violation witnesses 


Verif. . Sum of each analysis 
Val. CBMC CPACHECKER ESBMC UAUTOMIZER accepted rejected others 
(alarms) | 369 197 302 3 | 899 - : 
BTOR2-VAL 59 197 295 27 578 321 0 
CPA-w2T 0 122 0 0 122 = 777 
FSHELL-W2T 44 38 44 24 150 - 749 


to Bror2-VAL. Bror2-VAL and LIV agreed on the majority of the correctness 
witnesses, and the cases where they computed different verdicts were caused by 
a bug” in LIV, which has been fixed by its developers. The results show that 
Bror2-VAL is more robust than UAUTOMIZER and achieves similar effectiveness 
as LIV. We manually inspected several witnesses rejected by both Bror2-VAL 
and LIV and found that they indeed contain incorrect candidate invariants that 
do not overapproximate the reachable state spaces. Such invalid invariants might 
be caused by bugs in the conversion step of software verifiers from its internal 
formula representation back to the programming language C. 

Table 2 also reports the results when Bron2-Var is configured to accept 
correctness witnesses with different levels of invariant quality. Overall, 77% of 
the candidate invariants derived by software verifiers passed the invariant check 
of Bron2-VAL, but only 57% are deemed safe and inductive. As expected, the 
number of rejections increases with the strictness for invariant quality. However, 
there are 2 instances in Table 2 that passed the level “safe & inductive” but were 
not confirmed at the level "safe" by Bror2-VAL. Such cases occurred because 
ABC, the backend verifier of Bror2-Vat, ran into timeout when performing model 
checking, whereas the consistency, initiation, and consecution checks based on 
satisfiability easily went through. Among the four interpolation-based algorithms 
in CPACHECKER, predicate abstraction is the best in terms of invariant quality: 
It generated the most safe and inductive invariants. The results demonstrate 
the unique value of BroR2-Var to quantify the quality of invariants derived 
by software verifiers. 

For violation witnesses, BTroR2-VaL was far more effective than CPA-w2T 
and FSHELL-w2T in our experiment. Among 899 violation witnesses generated 
by software verifiers, BroR2-Var was able to validate 578 cases; It rejected 


T https://gitlab.com/sosy-lab/software/liv/-/issues/2 
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Fig. 8: CPU-time comparison of verification and witness validation (unit: s) 


other witnesses because they contain an incomplete or infeasible error path. In 
comparison, CPA-w2T and FSuELL-w27 only confirmed 122 and 150 witnesses, 
respectively. The numbers of rejected witnesses for CPA-w2T and FSHELL-w2T 
are not listed in Table 3 as the tools do not distinguish rejection of witnesses 
from other errors. We also observed that only 11 violation witnesses produced 
by CPACHECKER, EsBMC, and UAUTOMIZER were not validated by Bror2-VAL, 
but witnesses generated by CBwoc suffered from a high rejection rate. This is 
because the violation witnesses of CBMc often report an infeasible error path. 
Moreover, we notice that for many cases, different error paths are printed in 
CBMc's violation witnesses and the console logs for its execution.’ If we extract 
BTOR2 violation witnesses from the console logs instead, Bror2-Vat could validate 
359 out of the 369 cases where Cpmc found an alarm. The effectiveness of Bror2- 
Var in confirming translated BTOR2 violation witnesses showcases the value 
of BroR2-CzEnr because hardware designers can now trust software verifiers to 
detect bugs in their circuits and obtain a certified test case to trigger an error 
if software verifiers reported one. 


RQ3: Efficiency of Bron2-Var. We compared the CPU time required for 
BToR2-Var and other state-of-the-art validators. From our experimental results, 
Bror2-Vat (configured to accept safe and inductive invariants) achieved a median 
speedup of 2.2x over LIV for correctness witness validation, and a median 
speedup of 11x and 1.1x over CPA-w2T and FSHELL-w2r for violation witness 
validation, respectively. In addition, Fig. 8 shows the scatter plots for the CPU 
time consumption of the compared validators. A data point (x, y) in the plots 
corresponds to a case where CPACHECKER took x seconds to produce a witness 
and a validator took y seconds to validate the witness. Observe that most data 
points of BroR2-Var are below those of other validators. The efficiency of the 
proposed certifying framework in translating and validating violation witnesses 
minimizes the overhead to apply software analyzers to find hardware bugs and 
makes the results of software verifiers trustworthy for hardware designers. 


8 https://github.com/diffblue/cprover-sv-comp/issues/70 
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RQ4: Verfication versus Validation Time. Figure 8a (resp. Figure 8b) 
compares the CPU time for CPACHECKER to compute a verdict and generate a 
correctness (resp. violation) witness to the CPU time for a validator to check 
the witness. We can see that almost all data points are below the diagonal, 
indicating that validation time is typically shorter than verification time. Such 
speedup shows that the validators are able to utilize the information in witnesses 
to reconstruct proofs of correctness or violation more efficiently than verifying 
the task from scratch. 


RQ5: Complementing HW Model Checking with Bror2-Crrr. The 
empirical evaluation in the TACAS 2023 publication [5] on Bror2C demonstrates 
that software verifiers are able to complement the state-of-the-art hardware model 
checkers by finding more bugs and uniquely solving dozens of tasks. We take a 
step further and investigate whether the verification results of those additional 
alarms and uniquely solved tasks can be certified by BTOR2-CERT. 

BToR2-Cznr certified 37, 1, and 4 alarms found by the BMC implementa- 
tions of CBMC, CPACHECKER, and EsBMc, respectively, which cannot be detected 
by the BMC implementation of ABC.? The additional alarms found by Camc 
alone account up to 896 of unsafe tasks in our benchmark set. With the help 
of Bron2-Cznr, the violation witnesses generated by software verifiers can be 
translated to BTOR2 witnesses and validated by BrorSim. That is, the property 
violation reported by software verifiers can be replayed fully in the hardware 
domain, demonstrating the unique ability of BroR2-CERT to provide trustworthy 
verification results obtained by software analyzers. 

For property satisfaction, although the previous study shows that software 
verifiers are not as good at finding proofs for correctness as their hardware 
counterparts, we still observed a case where ABC (the backend verifier used by 
BTOR2-VaL) went into timeout but only required less than 3s to reconstruct 
a proof using the invariant generated by CPACHECKER, and another case with 
a 5x run-time speedup. 


Summaries of Results. From the reported results, we conclude that (1) software 
witnesses can be translated to hardware witnesses (Table 2 and Table 3), (2) Bror2- 
Vat is effective (Table 2 and Table 3) and efficient (Fig. 8), (3) witness validation by 
BTOR2-VAL consumes less time than software verification (Fig. 8), and (4) Bror2- 
CERT complements state-of-the-art hardware model checkers. 

As a by-product of this work, our intensive investigation of software witnesses 
led to the discovery of several bugs in software verifiers. We reported the issues 
to the developers of the tools, and some of the bugs have been fixed. A complete 
list of issues that we found in software analyzers during this project is available 
on the supplementary webpage [62]. 


7.5 Threats to Validity 


For external validity, our claims are established on a large set of BTOR2 circuits 
to increase confidence, but it is unclear if they will hold on tasks with different 


? We considered the 359 validated witnesses translated from console logs of Csuc. 
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features that are not covered in the used benchmark set. For construct validity, 
we report that witness validation is faster than verification, but validation and 
verification were done on behaviorally equivalent but syntactically different models 
(namely, a BTOR2 circuit vs. a C program). While the setting is not exactly the 
same as in a previous publication [4], it is necessary because our experiment is 
designed to investigate how information in software witnesses can be used by 
hardware analyzers. We compared Bror2-VAL prepended with witness translation 
to software witness validators. T'he former also uses the original BTOR2 circuit 
as input, but the validators for software do not leverage circuit information. 
We performed the comparison this way because the hardware witness validator 
CERTIFAIGER [29] does not accept an invariant as input. For internal validity, we 
ran the experiments with the popular benchmarking framework BENCHEXEc [53] 
to guarantee reproducibility. 


8 Conclusion 


Validating verification results is vital to make formal methods applicable in 
practice, as it reinforces the trust of users and offers more insights into the 
analyzed model. In this manuscript, we proposed Bror2-CeErt, a certifying and 
validating hardware-verification framework built upon translators and software 
analyzers. Bror2-Cerr is an open-source toolchain, involving the BTOR2-to-C 
translator Bror2C, certifying verifiers for C programs, a C-to-BTOR2 witness 
translator, the BTOR2 simulator BrorSim, and the validator BroR2-Var. We 
evaluated Bror2-Crrt’s capability of transferring the information across software 
and hardware analyzers and providing certified verification results on a large 
benchmark set. By employing software model checkers for hardware verification, we 
identified and certified 8% of the unsafe tasks in our benchmark set that the state- 
of-the-art conventional hardware model checker ABC overlooked. For future work, 
we will augment Bror2-Crrr to accommodate temporal decomposition [48], a 
preprocessing technique used to simplify sequential circuits before model checking. 
Such extension [31] has been made to k-inductiveness validators [29, 30]. 
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Abstract. Sequential decision-making tasks often require satisfaction 
of multiple, partially-contradictory objectives. Existing approaches are 
monolithic, where a single policy fulfills all objectives. We present auction- 
based scheduling, a decentralized framework for multi-objective sequential 
decision making. Each objective is fulfilled using a separate and indepen- 
dent policy. Composition of policies is performed at runtime, where at 
each step, the policies simultaneously bid from pre-allocated budgets for 
the privilege of choosing the next action. The framework allows poli- 
cies to be independently created, modified, and replaced. We study path 
planning problems on finite graphs with two temporal objectives and 
present algorithms to synthesize policies together with bidding policies 
in a decentralized manner. We consider three categories of decentralized 
synthesis problems, parameterized by the assumptions that the policies 
make on each other. We identify a class of assumptions called assume- 
admissible for which synthesis is always possible for graphs whose every 
vertex has at most two outgoing edges. 


1 Introduction 


Sequential decision-making tasks often require satisfaction of multiple, partially- 
contradictory objectives. For example, the control policy of a traffic light may 
need to choose signals in a way that the traffic throughput is maximized while 
the maximum waiting time is minimized [84], the control policy operating an 
unmanned aerial vehicle may need to navigate in a way that the destination is 
reached while no-fly zones are avoided [33], the policy of an operating-system 
resource manager needs to allocate resources to tasks in a way that deadlocks 
are avoided while fairness is maintained [3]. 

We propose a decentralized synthesis framework for policies when tasks are 
given as a conjunction of two objectives $4 and 5, and the policies need to 
choose actions from a common action space. The key idea is that 4 and 9» will 
be accomplished, respectively, using two action policies a, and a2—designed 
independently, and the composition of o, and az at runtime will generate a 
policy for $4 ^ 95. The challenge is that at each time point, one action needs 
to be chosen, whereas a; and a2 might select conflicting actions. For example, 
© The Author(s) 2024 
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when developing a plan for a robot, ©; and 5 might specify two target locations, 
and o, and o» may select opposite directions in a location. 


We propose a novel composition mechanism called auction-based scheduling: 
both policies are allocated bounded monetary budgets, and at each point in time, 
an auction (aka bidding) is held, where the policies bid from their budgets for the 
privilege to get scheduled for choosing the action. More formally, we equip each 
action policy o;, for i € (1,2), with a bidding policy B;, which is a function that 
proposes a bid from the available budget based on the history of the interaction. 
A tender for objective V is a triple T = (a, 6,B), where a is an action policy, 8 
is a bidding policy, and B € (0, 1) is a minimal budget required for the tender to 
guarantee V. Two tenders 7; and 72 are compatible if Bı +Bz < 1, which is when 
they can be composed at runtime as follows. Each Tender i, for i € {1,2}, is 
allocated an initial budget that exceeds B;, where the sum of budgets equals 1. 
At each point in time, the tenders simultaneously choose bids using their bidding 
policies, the higher bidder chooses an action using its action policy and pays the 
bid to the other tender. Thus, the sum of budgets stays constant at 1. Note that 
the composition gives rise to a path in the graph. The decentralized synthesis 
problem asks: Given a graph G and objectives d, 5 such that Pı ^9» Æ false, 
for each 9; compute 7; such that no matter which tender it is composed with, 
the composition generates a path that fulfills ;. The framework is sound-by- 
construction, namely the composition of compatible tenders satisfies $4 ^ &2. 


'The advantage of auction-based scheduling is modularity at two levels. First, 
since the designs of policies do not depend on each other, they can be created 
independently and in parallel, e.g., by different vendors or in a parallel compu- 
tation. Second, since the policies operate independently, they can be modified 
and replaced separately. For example, when only the objective 9, changes, there 
is no need to alter the policy a2, and vice versa. 


Bidding for the next action encourages the policy with higher scheduling 
urgency to bid higher, and at the same time, the bounds on budgets ensure 
fairness, namely that no policy is starved. Auction-based scheduling adds new, 
complementary features to the arsenal of modular approaches in multi-objective 
decision-making. With the conventional decentralized synthesis approaches, the 
policies are composed either concurrently [39] or in a turn-based manner [23]. 
Concurrent actions are meaningful if each policy needs to act on its own local 
control variables, e.g., when the local control policies of two robots concurrently 
move the robot towards their destinations in a shared workspace. In our case, 
the set of actions is common between policies, and the concurrent interaction 
is unsuitable. Likewise, turn-based actions are also unsuitable in our setting 
because it is unclear how to assign turns to policies apriori. We will demonstrate 
(Ex. |2) that an inappropriate turn-assignment to policies may violate some of 
the objectives, while auction-based scheduling will succeed to fulfil all of them. 


We study auction-based scheduling in the context of path planning on fi- 
nite directed graphs with pairs of w-regular objectives on its paths, and present 
algorithms for the decentralized synthesis problem with increasing levels of as- 
sumptions made by the tenders on each other: (a) Strong synthesis, with no as- 
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sumptions and the most robust solution, (b) assume-admissible synthesis, with 
the assumption that the other tender is not purely cynical and behaves ratio- 
nally with respect to its own objective, and (c) assume-guarantee synthesis, with 
explicit contract-based pre-coordination. We show for graphs whose every vertex 
has at most two outgoing edges, for every pair of w-regular objectives 9, 95, and 
for all three classes of problems (a), (b), and (c), there exist PTIME decentral- 
ized synthesis algorithms that either compute compatible tenders or output that 
no compatible tenders with the respective assumptions exist; surprisingly, we 
show that compatible tenders always exist for (b). For general graphs, we show 
that the problems are in NP N coNP. All our algorithms internally solve bidding 
games using known algorithms from the literature [38]37]. Due to the lack of 
space, some proofs are omitted, but can be found in the extended version [17]. 


2 Preliminaries 


Let X be a finite alphabet. We use ©* and X" to respectively denote the set of 
finite and infinite words over X, and X% to denote X* U X". Let for two words 
u € X* and v € XY, u € v denote that u is a prefix of v, i.e., there exists a w 
such that v = uw. Given a language L C X**, define pref(L) to be the set of 
every finite prefix in L, i.e., pref(L) = (ue X* |dee L.u& v]. 


Graphs. We formalize path planning problems on graphs. A graph G is a tuple 
(V, vo, E) where V is a finite set of vertices, v? is a designated initial vertex, 
and E C V x V is a set of directed edges. If (u,v) € E, then v is a successor 
of u. A binary graph is a graph whose every vertex has at most two successors. 
A path over G is a sequence of vertices v°v'... so that every (vt, vtt!) € E. 
Unless explicitly mentioned, paths always start at v. We use Paths" (G) and 
Paths" (G) to denote the sets of finite and infinite paths, respectively. 

A strongly connected component (SCC) of the graph G is a set S of vertices, 
such that there is a path between every pair of vertices of S. An SCC S is called 
a bottom strongly connected component (BSCC) if there does not exist any edge 
from a vertex in S to a vertex outside of S. The graph G is itself called strongly 
connected if V is an SCC. 


Objectives. Fix a graph G. An objective ® in G is a set of infinite paths, i.e., 
$ C Paths" (G). For an infinite path p, we use Inf (p) to denote the set of vertices 
that p visits infinitely often. We focus on the following objectives: 


Reachability: for S C V, Reachg(S):— [ov ... € Paths" (G) |3i >0. vie s}, 
Safety: for S C V, Safeg(S) = [oov ... € Paths" (G) | Vi > 0. vi € 8), 


Biichi: for S C V, Büchig(S) = [p € Paths" (9) | mf) NS #0}, 
Parity (max, even): for Col: V — [0; k] for some k > 0, Parityg( Col) = 
[p € Paths" (G) | max {i | Jv € Inf(g) . Col(v) = i} is even}, 


Given an objective 9, we will use ° to denote its complement, i.e., ° = 
Paths" (G) V ®. Observe that (Reachg(S))° = Safeg(V \ S). 
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Action policies. Fix a graph G. An action policy is a function a: Paths""(G) > 
V, choosing the next vertex to extend any given finite path pv, where (v, a(pv)) € 
E. The action policy a is memoryless if for every pair of distinct finite paths 
pv, p'v that end in the same vertex v, it holds that a(pv) = a(p’v); in this case 
we simply write a(v). An action policy o generates a unique infinite path in C, 
denoted out(a), and defined inductively as follows. The initial vertex is v?. For 
every prefix v°,...,v’ of out(a), for i > 0, v^*! = a(v9,...,v*). We say that the 
policy o satisfies a given objective 9, written a = 9 iff out(o) € ©. 


3 The Auction-Based Scheduling Framework 


Consider a graph G = (V, v?, EY. A pair of objectives 1,2 C V" in G are 
called overlapping if they have nonempty intersection (i.e., Bı N B2 Æ Ø). The 
multi-objective planning problem asks to synthesize an action policy that satisfies 
the global objective 44 N &2 for overlapping 91, 9». 

We propose a decentralized approach to the problem. Our goal is to design 
two action policies o; and o» for 9, and 9», respectively. We will equip each 
action policy with a bidding policy, which it will use at runtime to bid for choosing 
the action at each time point. We formalize this below. 


Definition 1 (Bidding policies). A bidding policy is a function B: V x[0, 1] > 
[0, 1] with the constraint that B(v, B) € B for every vertex v and every amount 
of available budget B € [0,1]. 


We equip a pair of action and bidding policies with a threshold budget, which 
represents the greatest lower bound on the initial budget needed for the policies 
to guarantee their objective, and we call the resulting triple a tender. 


Definition 2 (Tenders). A tender for a given graph G is a triple (a, 3,B) of 
an action policy a, a bidding policy B, and a threshold budget B € [0, 1]. The set 
of all tenders for G is denoted T9. A tender 7 satisfies an objective ®, denoted 
TES, iff e E- @ (i.e., when the tender is operating alone on the graph). 


Next, we formalize the composition of two tenders at runtime, which pro- 
duces an action policy that uses a register of memory to keep track of the avail- 
able budgets. We introduce some notation. A configuration is a pair (v, B1), 
where v is a vertex and B, is the budget available to the first tender. We nor- 
malize the sum of budgets to 1, hence implicitly, the budget available to the 
second tender is B5 = 1 — Bı. Let C = V x [0,1] be the set of all configura- 
tions. For a given sequence of configurations s = (v9, B?) (vt, Bi)... € C9, let 
projy (s) denote the path v9v! .... A history is a finite sequence of configurations 
(v9, B9) ... (v^, Bf) € C* with the constraint that projy(s) € Paths""(G). Let 
H be the set of all histories. 


Definition 3 (Composition of tenders). LetG be a graph, and T, = (a1, 31, B1) 
and T2 = (a2, 32,Bz2) be two tenders. The tenders 7, and T2 are compatible iff 
Bı +B < 1. If compatible, then their composition, denoted T1™7T2, is a function 
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T4 P472 : H — C defined as follows. Given a history h = (v9, B3) s (v5, BE) EH, 
let bi := fi(v*, BE) and bs := Bo(v*,1— B*). Then, 


— if by > be, then ru Pavo (h) = (aı (pv), By — bı), and 
m if bi < bo, then 71 Pa Ta (À) = (as (pv), Bı + bg). 


Given an initial configuration (v°, BP) with B? > Bi and B} = 1 — B? > Bs, 
the composition outputs an infinite sequence of configurations, denoted out(T, ra 
79), where out(ri ba 72) = (v°, BY) (vt, Bi}... € C" such that for every k, 
(vt, BF) = Tj X T2 (Cu Dh) "m Qus Br wy We will say T1 X T2 satisfies a 
given objective P, written TXAT = P, iff projy (out(71i ba72)) € B. 


We will often use the index i € {1,2} to refer to either of the two tenders or 
their attributes, and will use —i = 3 — i for the “other” one, e.g., 7 and 7. j. 
Notice the difference between B; and B9: i is the threshold budget at v9 which 
is a constant attribute of 7;, whereas B? is the actual budget initially allocated 
to T; whose value can be anything above B,. 


3.1 Classes of decentralized synthesis problem 


In this section, we describe three classes of decentralized synthesis problems that 
we study. Throughout this section, fix a graph G and a given pair of overlapping 
objectives $1 and 9$». 

Strong decentralized synthesis. Here, tenders make no assumptions on each 
other, thus the solutions provide the strongest (the most robust) guarantees. 
Formally, for each i € {1,2}, the goal is to construct 7; such that for every 
compatible t_;, we have r;ba7..; H 4. 


(a) Strong (b) Assume-admissible (c) Assume-guarantee 


Fig. 1: Graphs with two reachability objectives given by targets: Thiue, depicted 
in blue, Treg depicted in red, and Thue O Trea depicted in purple. The action 
policies of the red and blue tenders choose edges with, respectively, red and blue 
shadows (shared edges are in purple). If no edges from a vertex have red or blue 
shadow, then the respective tender is indifferent about the choice made. Thick 
edges depict the paths taken by the compositions of tenders. 
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Example 1. Consider the graph depicted in Fig. [a] with a pair of reachability 
objectives having the targets Thue = {c,d,g} and Trea = {d, f), respectively. 
Their intersection {d} is depicted in purple. We present a pair of robust tenders 
Tblue ANd Trea With Bplue = 1/4 and Brea = 1/2, so that 7gjje and Treg are compatible. 
We will show that Tbiue guarantees that no matter which compatible tender it is 
composed with, eventually Tbiue is reached, and similarly 7,44 ensures that Treg 
is reached. Therefore, TplueTreq ensures that d is reached. 

We first describe 7gie. Consider an initial configuration (a, 1/4 + €), for any 
€ > 0. Note that the other tender's budget is 3/4 — e. The first action of Tblue is 
(b, 1/4). There are two possibilities. First, rye wins the bidding, then we reach 
the configuration (b, €), and since both successors of b are in Thue, the objective is 
satisfied in the next step. Second, Tblue loses the bidding, meaning that the other 
tender bids at least 1/4, and in the worst case, we proceed to the configuration 
(e, 1/2 + e). Next, Thue chooses (g,!/2) and necessarily wins as Treq’s budget is 
only 1/2 — e, and we reach g € Thiue- We stress that Thue can be entirely oblivious 
about Treq, except for the implicit knowledge of 7,44's budget. 

We describe 7;,4. Consider an initial configuration (a, 1/2 + €), for any e > 0. 
Initially, Trea bids 0, because it does not have a preference between going left or 
right. In the worst case, the budget stays 1/2 + e in the next turn. Since both b 
and e have single successors in Treg, thus Treg must win the bidding. It does so 
by bidding 1/2, which exceeds the available budget 1/2 — € of Trea. A 


We now use the same problem as in Ex. |1| and show that the conventional 
turn-based interaction may fail to fulfill both objectives. 


Example 2. Consider again the graph depicted in Fig. flalwith the targets Thiue = 
(c, d, gd and Trea = (d, f}. Suppose Qplue and Qreq are the two respective action 
policies, and we arbitrarily decide to make their interaction turn-based, where 
Qreq Chooses actions in a and oie chooses actions in b and e. It is clear that no 
matter which edge Qreq chooses from a, it cannot guarantee satisfaction of Treg, 
because Qblue can take the game to c or g depending on Qyeq’s choice. A 


Assume-admissible decentralized synthesis. While the guarantees of strong 
decentralized synthesis are appealing, it often fails as each tender makes the 
pessimistic assumption that the other tender can behave arbitrarily—even ad- 
versarially. We consider admissibility [23] as a stronger assumption based on ra- 
tionality, ensuring compatible tenders to exist even when strong synthesis may 
fail. We illustrate the idea in the following example. 


Example 3. Consider the graph in Fig. with reachability objectives given by 
targets Thue = (d, g} and Treg = (d, f]. We argue that strong decentralized syn- 
thesis is not possible. Indeed, using the same reasoning for Trea in Ex.|1} we have 
Thgue(a) = Threa(a) = 0.5. On the other hand, observe that when synthesizing 
Tred, Since c É Thiue, we know that a “rational” ryue—formally, admissible Tpiue 
(see Sec. (6)—will not proceed from b to c, and we can omit the edge. In turn, the 
threshold in a decreases to 1/4 for both objectives. Since the sum of thresholds 
is now less than 1, two compatible tenders can be obtained. A 
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In general, we seek an admissible-winning tender, which ensures that its 
objective is satisfied when composed with any admissible tender. Admissible- 
winning tenders are modular because they can be reused provided that the set 
of admissible actions of the other tender remains unchanged. For example, even 
when vertex g is added to the red target set, the blue tender can be used with 
no change. Somewhat surprisingly, we show that in graphs in which all vertices 
have out-degree at most 2, assume-admissible decentralized synthesis is always 
possible, and a pair of admissible-winning tenders can be found in PTIME. 
Assume-guarantee decentralized synthesis. Sometimes, even the admissi- 
bility assumption is too weak, and we need more direct synchronization of the 
tenders. We consider assume-guarantee decentralized synthesis, where each ten- 
der needs to respect a pre-specified contract, and as a result, their composition 
satisfies both objectives. We illustrate the idea below. 


Example 4. Consider the graph depicted in Fig. [ic] [Lc] with reachability objectives 
given by targets Thue = (c, d, g} and Trea = inl , f}. Here, the strong decentralized 
synthesis fails due to reasons similar to Ex.|3| The assume-admissible decentral- 
ized synthesis fails because from e, both objectives cannot be fulfilled, and from 
b, no matter which tender wins the bidding can use an admissible edge that 
violates the other objective (e.g., (b, c) is admissible for 7gie but violates Trea). 
We consider the contract (Glue, Grea) = (G ^c, Gf), which is satisfied when 
(a) if Qblue fulfills Gblue, then Qreq fulfills Grea, and (b) if Qreq fulfills Grea, then 
Qblue fulfills Gpiue. Now whichever tender wins the bidding at b needs to fulfill 
its guarantee, because it cannot judge from the past interaction if the other ten- 
der violates its guarantee. Therefore, from b, the next vertex will be d under the 
contract, and using the same tenders from Ex.|3} both objectives will be fulfilled. 


4 An Aside on Bidding Games on Graphs 


All our synthesis algorithms internally solve bidding games, which we briefly 
review here; see the survey [8| for more details. A (two-player) bidding game 
is played between Player X and Player Y, and is a tuple (G,®), where G = 
(V, E) is the (finite, directed) graph and @ C V" is the objective for Player X. 
The game is zero-sum, meaning that the objective of Player Y is VY \ à, i.e., 
the violation of 9. This differs from auction-based scheduling where objectives 
overlap; otherwise, the interaction between Player X and Player Y is the same 
as the one between tenders. A strategy for a player is a pair (a, 3) where a is an 
action policy and f is a bidding policy. As in the composition of tenders, two 
strategies and an initial configuration (v, B4) give rise to an infinite sequence of 
configurations called a play. A strategy is winning if no matter which strategy 
the opponent follows, the play satisfies the player’s objective. A central quantity 
in bidding games is the threshold budget in a vertex v, which is intuitively, a 
necessary and sufficient initial budget for Player X to guarantee winning. 


Definition 4 (Threshold budgets). Consider a bidding game (G,®) with G = 
(V, E). The threshold of Player X is given by THE : V — [0,1], where for every 
v € V, we have Th (v) = infg (Player X has a winning strategy from (v, B)}. 
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The threshold of Player Y is denoted as Th{.(v). The following theorem 
characterizes the structure of thresholds and states that the two players’ thresh- 
olds are complementary. The intuition can be found in the full version [17], where 
we also show how winning strategies can be constructed from thresholds. 


Theorem 1 ([38]). Consider a reachability bidding game (G,®) where 4 is 
Reachg(T) where, without loss of generality, T is a given set of sink vertices. For 
every verter v, we have ThZ(v) = 1— Thg.(v). Moreover, for every sink vertex t, 
we have ThS (t) — 0, ift € T, and ThE (t) = 1 otherwise. For every vertex v, we 
have Thg(v) = 0.5- (Th (vt) + ThE (v-)), where v- and v* are successors of v, 
such that for every other successor u, we have Thg(v-) € Thg(u) < Thg(v*). 
Verifying if Th (v) > 0.5 for a given vertex v is in NPN coNP in general and is 
in PTIME for binary graphs. 


For infinite-horizon objectives, like parity, it is known that eventually one of 
the BSCCs will be reached, and inside every BSCC every vertex can be reached 
by both players infinitely often with every arbitrary initial budget. This implies 
that for every parity objective, the threshold of every vertex inside every BSCC 
in a game graph is either 0 or 1, and fulfilling a given parity objective is equivalent 
to reaching a BSCC whose every vertex has threshold 0. We state this formally. 


Theorem 2 ([10]). Consider a bidding game (G,®) with a parity objective o. 
Let S be a BSCC of G. Every vertez in S has threshold either 0 or 1, and it 
is 1 iff the highest parity index in S is odd. Moreover, for a vertex v not in a 
BSCC, we have Thg(v) = TR eachg(T) (v), where T is the union of BSCCs whose 


vertices have threshold 0. 


5 Strong Decentralized Synthesis 


We study the strong decentralized synthesis problem, where the goal is to syn- 
thesize two compatible robust tenders, i.e., tenders that guarantee the fulfillment 
of their objectives when composed with any compatible tender. 


Definition 5 (Robust tenders). Let G be a graph and Ð; be an objective in 
G. A tender Ti is robust for B; if for every other compatible tender T_; € TY, 
we have T;>T_; = Pi. 


Problem 1 (STRONG-SYNT). Define STRONG-SYNT as the problem whose 
input is a tuple (G, 1, B2), where G is a graph and Bı and d» are overlapping 
w-regular objectives in G, and the goal is to decide whether there exists a pair of 
tenders T1, T2 € TY such that: 


(I) Tı and ra are compatible, 
(II) Tı is robust for 91, and 
(III) Ta is robust for $5. 


Since each robust tender 7; guarantees that 6; is satisfied when composed 
with any tender, the composition of two robust tenders satisfies both objectives: 
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Proposition 1 (Sound composition of robust tenders). Let 7, and Ta be 
two compatible robust tenders for (G, B1, B2). Then TT = Bı N Do. 


We reduce the strong decentralized synthesis problem to the solution of two 
independent bidding games, both played on the graph G, one with Player X's 
objective ®; and the other one with Player X's objective 2. When the sum of 
thresholds in v? is less than 1, we set the two tenders to be winning Player X 
strategies in the two games with the threshold budgets of the tenders being set as 
the respective thresholds in v?. It follows from the construction that both tenders 
are robust, and hence their composition will fulfill both objectives (Prop. 1). 


Theorem 3 (Strong decentralized synthesis). Let G = (V, v9, E) be a 
graph and ®, and $$ be a pair of overlapping w-regular objectives. A pair of 
robust tenders exists iff ThE, (v9) + THE. (v?) < 1. Moreover, STRONG-SYNT is 
in NP N coNP in general and is in PTIME for binary graphs. 


Proof. First, assume that TAY, (v9) + Th, (v?) < 1. For i € {1,2}, let (ai, bi) 
denote a winning Player X strategy in the bidding game (G,®;) from every 
configuration (v9, B) with B > Th{ (v^). We argue that the render 7, = 
(a1, bi, The, (v9)) is robust for ®ı, and the proof for Tə is dual. Indeed, for 
any compatible tender 75 = (a4, 85,B4), the pair (05,85) corresponds to a 
Player Y strategy in the bidding game (6,4). The resulting play coincides 
with out(r; baT2)((v9, B)) and satisfies 4, since the strategy (a1, 81) is winning. 

Second, suppose that The, (v?) + The. (v?) > 1. For any allocation B, + 
Bo < 1, there is ani € {1,2} such that B; < Th$,(v°). Assume WLog that 
Bı < The (uU). Consider a winning Player Y strategy (a2, (2) in the bidding 
game (0,4) from (v?, B1). The tender 75 = (a2, 85,1 — B1) is compatible and 
out(71174((v°, B1))) violates d. 

Finally, in order to obtain the complexity bounds, we guess memoryless action 
policies in both games, which are known to exist [37], and verify that they 
are optimal. Based on the guess, we devise a linear program to compute the 
thresholds. Finally, we verify that the sum of thresholds in v? is less than 1. 
For binary graphs, there is no need to guess the action policy in order to find 
thresholds (Thm. i). 


We identify a setting where strong decentralized synthesis is always possible. 
The following theorem follows from the result that threshold budgets in strongly- 
connected Büchi games containing at least one accepting vertex are 0. 


Theorem 4 (Strong decentralized synthesis on SCCs). Consider a strongly- 
connected graph G and a pair of non-empty Büchi objectives in G. Then, a pair 
of robust tenders exists in G. 


We demonstrate the effectiveness of strong synthesis using path planning 
problems with two reachability objectives. Consider a fixed grid but four differ- 
ent instances of the problem, as shown in Fig. |2| For the first three cases, we 
successfully obtain pairs of robust tenders whose compositions fulfill both objec- 
tives. Moreover, since the blue target remained the same in all cases, we needed 
to redesign only the red tender, saving us a significant amount of computation. 
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8 8 8 8 
7 7 T 7 
6 6 6 6 
5 5 5 5 
4 4 4 4 
3 3 3 3 
2 2 2 2 
1 1 1 1 


ABCDEFGH ABCDEFGH ABCDEFGH ABCDEFGH 


Fig. 2: Robust tenders for path planning with two reachability objectives on a 
one-way grid, where the black cells are obstacles and the only permissible moves 
are from lighter to darker green cells—and not the other way round. The cell B8 
is the initial location. The cells with double circles of colors red (respectively, G7, 
E5, E3, C3) and blue (G1) are the targets to reach. The path shows the output of 
the composition of the two tenders, where the red and blue segments are actions 
which were chosen by the red and blue tenders, respectively. The cells with red 
and blue squares are locations where the respective tenders win the bidding; in 
the rest of the cells on the paths, the bidding ended in ties which were resolved 
randomly. Strong synthesis was successful in the first three instances and failed 
in the last one. The pairs of thresholds of red and blue targets are, respectively 
(left to right): (0.75, 0.125), (0.625, 0.125), (0.75, 0.125), (0.875, 0.125). 


6 Assume-Admissible Decentralized Synthesis 


In assume-admissible decentralized synthesis, each tender assumes that the other 
tender is rational and pursues its own objective. We formalize rationality by 
adapting the well-known concepts of dominance and admissibility from game 
theory [1[22]. Intuitively, r; dominates 7; if 7; is always at least as good as 7; 
and sometimes strictly better than 7;; therefore, there is no reason to use 7;. An 
admissible tender is one that is not dominated by any other tender. 


Definition 6 (Dominance, admissibility). Let G be a graph and ® be an 
objective. We provide definitions for the first tender and the definitions for the 
second tender are dual. Let Bı < 1. For two tenders Tı and Ti that have the 
same budget allocation, Tı dominates r| when 


a) Tı performs as well as Tri when composed with any compatible T2; formally, 
1 y y 
for every compatible tender 7o, T| >IT. = P implies ri Pavo = P, and 
b) there is a compatible tender Ta for which T, performs better than T! ; formally, 
1 y 
there exists a compatible ro with Ti baT2 = P, and Ti par» |^ P. 


A tender 7, is called -admissible in G iff it is not dominated by any other 
tender. We denote the set of ®-admissible tenders in G by Adm? (4). 


Next, we define admissible-winning tenders, which are tenders that fulfill 
their objectives when composed with any admissible tender. 
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Definition 7 (Admissible-winning tenders). Let G be a graph and 4,4» 
be a pair of overlapping objectives in G. A tender 7; is called 9. ;-admissible- 
winning for 9; if and only if T; € Adm? (Si), and for every other tender T_; € 
Adm (à. ,) compatible with Ti, we have TiTi E- Pi. 


When the objectives are clear from the context, we will omit them and will 
simply write a tender is “admissible tender,” “admissible-winning tender,” etc. 


Problem 2 (AA-SYNT). Define AA-SYNT as the problem whose input is a 
tuple (G, Bı, B2), where G is a graph and d, and d» are overlapping w-regular 
objectives in G, and the goal is to decide whether there exists a pair of tenders 
Tı € Adm? (44) and T2 € Adm? (95) such that: 


(I) Ti and ra are compatible, 
(II) Tı is $5-admissible-winning for Bı, and 
(III) T2 is $1-admissible-winning for $5. 


The following proposition follows from the requirement that 7, and Tə are 
admissible. 


Proposition 2 (Sound composition of admissible-winning tenders). Let 
Tı and T2 be tenders that fulfill the requirements stated in Prob. Then, Ti P4 
T9 E Pı N Po. 


Remark 1. Note that the synthesis procedure for each 9. ;-admissible-winning 
tender 7; for d; requires the knowledge of Adm? (@_;). Assume-admissible de- 
centralized synthesis is modular in the following sense. First, the specific imple- 
mentation of the tender 7_; with which each 7; is composed is not known during 
synthesis. All that is known is the objective 9. ; for which 7..; is synthesized. 
Second, each 7; can remain unchanged even when 9. ; changes to 9 ;, as long 
as Adm? (9! ;) C Adm? (à. ). 


6.1 Reachability objectives 


Throughout this section we focus on overlapping reachability objectives @; = 
Reachg(T,) and $9 = Reachg(T3) with Ti, T> C V being sets of sink target 
vertices. This is without loss of generality, as every graph with non-sink tar- 
get vertices can be converted into a graph with sink target vertices by adding 
memory (see the full version [17]). 

We reduce the decentralized assume-admissible synthesis problem to solving 
a pair of zero-sum bidding games on a sub-graph of G. Intuitively, an edge 
e = (u,v) is dominated for the i-th tender, for i € {1,2}, if it is possible to 
achieve the objective 9; from u but not from v. Clearly, a tender that chooses e 
is dominated and is thus not admissible (see the proof of the lemma in the full 
version [17]). Recall that Th$, (v) denotes the threshold in the zero-sum bidding 
game played on G with the Player X objective ®;, and that Th§, (v) = 1 means 
there is no path from v to T;. 
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Lemma 1 (A necessary condition for admissibility). For every verter u 
having at least two successors v,w with Th$, (v) « 1 and ThE, (w) = 1l, ifa 
Player i tender (ai, Bi, B:) is in Adm9(®;), then a;(u) 4 w, for both i € (1,2). 


Proof. We argue that choosing w from u is dominated by the action of choosing 
v from u, no matter what the budget at u is. Firstly, Cond. (a) of Def. [6ltrivially 
holds. Secondly, consider the other tender 7_; which bids zero at u, and later 
cooperates with 7; to satisfy &;. Clearly, the 7;’s action policy that selects v at 
u will be able to satisfy ®;, but the one that selects w will not. 


We obtain the reduced graph by omitting edges that are dominated for both 
players. For example, in Fig. the edge (b,c) is dominated for both players 
(see Ex.|3) and in Fig. no edge is dominated for both players (see Ex. 


Definition 8 (Largest admissible sub-graphs for reachability). The largest 
admissible sub-graph of G with respect to two reachability objectives Pı and 4» 


is Ĝe, s, = (V',E') with V! = V [v € V | TRY, (v) 2 1^ ThE, (v) = 1} and 
E! = (V'xV^nE. We omit B1, $5 from Gs, o; when it is clear from the context. 


For a vertex v in G and i € {1, 2}, recall that The (v) denotes the threshold 
in G for objective $;, and T! he (v) denotes the threshold in the reduced graph. 
Observe that a winning strategy in G will never cross a dominated edge. Remov- 
ing dominated edges restricts the opponent, thus Tho (w) > Ti het (v). The next 
lemma shows that, surprisingly, the decrease in sum of thresholds is guaranteed 
to be significant. The proof (see the full version [17]) which holds for non-binary 
graphs, intuitively follows from observing that in G. , necessarily a sink that is a 
target for one of the players is reached, and since there is an overlap in at least 
one sink, the sum of thresholds is at most 1. 


Lemma 2 (On the sum of thresholds in G). For every vertex v, we have 
Thg, (v) + Th. (v) € 1. Moreover, if G is binary then Th, (v) + The. (v) « 1. 


Our synthesis procedure proceeds as in strong decentralized synthesis: Find 
and output a pair of robust tenders in G. , which are guaranteed to exist when G 
is binary. In order to maintain soundness (see Prop. 2), it is key to show that 
a robust tender 7; in G is admissible in G. The proof of the following lemma 
is intricate (see the full version [17]). We show that even when one can find 77 
and 7. such that T;ba7.; |£ P; but 7/>17_; H ®;, it is possible to construct 
Tl; for which r;baT7/; H 4; but 7; ra! ; KF Bi, thus 7; does not dominate 7;. 
Furthermore, such a tender wins against a set of tenders which over-approrzimates 
admissible tender for 9. ;. 


Lemma 3 (Algorithm for computing admissible-winning tenders). For 
i € {1,2}, a robust tender for B; in G is B_;-admissible-winning for B; in G. 


'The following theorem is obtained by combining Lemmas J and 
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Theorem 5 (Assume-admissible decentralized synthesis for reachabil- 
ity). The problem AA-SYNT is a tautology for binary graphs: for every binary 
graph and two overlapping reachability objectives, there exists a pair of compatible 
admissible-winning tenders. Moreover, the tenders can be computed in PTIME. 


Remark 2. For general (i.e., non-binary) graphs, AA-SYNT is not a tautology 
anymore; a counter-example is given in Ex. |4| However, the same PTIME algo- 
rithm for computing tenders can still be used to obtain a sound solution; the 
completeness question is left open for future work. 


6.2 Büchi objectives 


In this section, we consider binary graphs with a pair of overlapping Biichi 
objectives. We first demonstrate that, unlike reachability, it is not guaranteed 
that an assume-admissible decentralized solution exists. 


Example 5. Consider the graph depicted in Fig. |3] with the Büchi objectives 
given by the accepting vertices Sreq = [b,d) and Sbiue = {a,c}. Note that the 
objectives are overlapping since the path (bc)" satisfies both. We argue that no 
pair of compatible admissible-winning tenders exist. Note that a robust (hence 
dominant) red tender forces reaching d, thus forcing Deg to be satisfied. Dually, 
a robust blue tender forces gi. in a. It can be shown that TRI. (0) = 2/s and 
ThS (b) = 1/3. Thus, for any Brea and Bpiue with Brea + Bpiue < 1, there is a 
robust tender that violates the other tender’s objective. A 


We generalize the concept of start 


largest admissible subgraphs to Büchi 
objectives. It is not hard to show 
that proceeding into a BSCC with 


an accepting state is admissible. In- Fig. 3: 
deed, Thm. |2| shows that there is a 
robust (hence admissible) tender in 
such BSCCs. On the other hand, proceeding to a BSCC with no accepting ver- 
tex is clearly not admissible. The largest admissible subgraph is obtained by 
repeatedly removing BSCC that are not admissible for both tenders. Formally, 
for a given action policy a and a given vertex v of G, we will write a A, &; U 2 
to indicate that the action policy cannot fulfill 6; U 95 from the initial vertex v. 


A graph with no assume- 
admissible decentralized solution. 


Definition 9 (Largest admissible sub-graphs for Büchi). The largest ad- 
missible sub-graph Gg of G for the Büchi objectives Pı, Ba is the graph (V', E") 
with V' = V\{v € V | Y action policy a. a Æ, Bı U Ba}, and E' = (V’xV')NE. 


We describe a reduction to reachability games. For i € {1,2}, let T; denote 


the union of BSCCs of J; g in which there is at least one Büchi accepting vertex for 
@;. We call (Ti, T5) the reachability core of ($4, $5) in Gg. Let | = Reacha. (Tı) 
and $5 = Reachg, (T2). We proceed as in strong decentralized synthesis: we find 
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Th2* (v?) and Tha (v?) and return a pair of robust tenders if their sum is strictly 
less than 1. Note that unlike reachability objectives, in Büchi objectives the sum 
might be 1 as in Ex. |5| Moreover, as Ex. [5] illustrates, when the sum is 1, no 
pair of admissible-winning tenders exist. By adapting results from the previous 
section, we obtain the following. 


Theorem 6 (Assume-admissible decentralized synthesis for Biichi). 
Let G be a binary graph and 41,4» be a pair of overlapping Büchi objectives. Let 
(Tı, To) be the reachability core of (1,45) in the largest admissible sub-graph 
of G (for 91,95). A pair of admissible-winning tenders exists iff Ti A T3 # Ü. 
Moreover, AA-SYNT for Btichi objectives is in PTIME. 


Like reachability (see Rem. b). for Büchi objectives the same algorithm for 
AA-SYNT for binary graphs can be used to obtain a sound solution for general 
graphs, and the completeness question is left open for future work. 


7  Assume-Guarantee Decentralized Synthesis 


We present the assume-guarantee decentralized synthesis problem, the one with 
the highest degree of synchronization among the tenders, with the benefit of the 
most applicability. In this synthesis procedure, we assume that we are given a 
pair of languages A1, Ag C V", called the assumptions. Intuitively, each tender 7; 
can assume A; is fulfilled by the other tender, and, in return, needs to guarantee 
that A_, is fulfilled, in addition to fulfilling own objective. 


Definition 10 (Contract-abiding tenders). Let G be a graph, B; be an w- 
regular objective, and A1, Ao be a pair of w-regular assumptions in G. We say 
a tender T; = (oj,*,-) € T? fulfills ®; under the contract (A1, A2), written 
Tj E (A; > P; > Ai) iff 

(a) for every finite path p, if p is in pref(A;), then p- ailp) € pref (A1 N A2), and 
(b) for every other compatible tender T—; € T9 , we have Tipa Ti E» Safeg (pref( Ai 


Here, each tender 7; only make safety assumption on the other tender (Cond. (a)), 
namely that the path does not leave the safe set pref ( A;), and in return, provides 
full guarantee on A. ; (Cond. (b)). Normally, safety assumptions are not enough 
for fulfilling liveness guarantees and objectives [5]. But in bidding games, within 
the safe set, the players can use a known bidding tactic [9] to accumulate enough 
budgets from time to time to reach the liveness goals always eventually. We use 
A1, Ag as w-regular sets, though we conjecture that safety assumptions suffice. 
'The assume-guarantee distributed synthesis problem asks to compute a pair of 
tenders that fulfill their objectives under the given contract, as stated below. 


Problem 3 (AG-SYNT). Define AG-SYNT as the problem that takes as input a 
tuple (G, 491, P2, A1, A2), where G is a graph, Pı and d» are overlapping objectives 
in G, and A, and A3 are two w-regular languages over V with v? € pref(A1N A2), 
and the goal is to decide whether there exists a pair of tenders 71,72 € T9 such 
that: 
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(I) Tı and ra are compatible, 
(II) Ti = (A, > $4 > A2), and 
(TII) T2 = (Az > 65 P A1). 


When the assumptions allow all behaviors, i.e., Ay = A» = V^, then AG-SYNT 
is equivalent to STRONG-SYNT. On the other hand, when the assumptions allow 
only each other's objectives, i.e., A; = 94 and Ag = $5, then we obtain a purely 
cooperative synthesis algorithm. We prove that satisfaction of the contracts by 
a pair of tenders will imply satisfaction of 6; N 95. 


Proposition 3 (Sound composition of contract-abiding tenders). Let 
Tı and Tz be tenders that fulfill the requirements stated in Prob. Then, T > 
T2 E d N Po. 


Proof. In the following, for a given language L € V", we write Safeg(pref(L)) 
to denote the set of infinite paths which can always be extended to L, i.e., 
[ov ... € Paths" (G) | Vi > 0. v9... vi € pret(L)). 

We claim that both assumptions Aj, A» will be fulfilled, from which Cond. (b) 
of Def. [10] will imply satisfaction of both $4 and 4» by Tı and 7», respectively. 
Let A = Aı Ag, and A can be decomposed into safety and liveness components 
as A = Safeg(pref(A)) N (Safeg(pref(A)) = > A). We prove the claim on the 
two components separately. Firstly, the fact that 7,>72 implements pref(A) on 
G can be proven by induction over the length of the generated path: The base 
case is given by the assumption v? € pref(A, N A3), and for every finite path p, 
if 7; wins the bidding and if p € pref( A1 N A2) C pref( A;) then 7; needs to ensure 
that the next vertex v’ satisfies pv’ € pref (A;N A. .;) (consequence of Cond. II-III 
of Prob. Blana Cond. (a) of Def. 10), thereby implying that the path will always 
remain inside pref( A1 N Ag), proving the safety part. 

For the liveness part, we use known results from Richman bidding games, 
which guarantee that in an infinite horizon game, with any arbitrary positive 
initial budget, players can always eventually visit any vertex that can be reached 
37]. This implies that if the invariance Safeg(pref(A; N A2)) holds, then each 
tender 7; can actually fulfill A_; (they are required to do so by Cond. (b) of 
Def. when composed with any compatible tender in the long run. Therefore, 
A1 N Ag will be fulfilled. 


In bidding games literature, it is unknown how to compute strategies for ob- 
jectives which can be violated if a given assumption is violated by the opponent, 
like in Cond. (a) in Def. The challenge stems from the lack of separation of 
the set of available actions to the players, preventing us to impose assumptions 
only on the opponents behavior. We present a practically motivated sound, but 
possibly incomplete, solution for the decentralized synthesis problem, by using 
a stronger way of satisfying the contract, namely by requiring each tender 7; to 
use actions so that the generated path remains in pref(A, A2) all the time. For- 
mally, we say that the tender 7; strongly fulfills ®; under the contract (A1, Ag), 
written 7; E, (A; > B; > A_i), if, instead of Cond. (a) of Def. for every finite 
path p, we have p-a;(p) € pref( A4 N Ag), regardless of whether p € pref(A;) 
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or not, and moreover Cond. (b) of Def. [10] is fulfilled. It is easy to show that 
Ti Es (Ai P Bid Ai) implies Ti E (A; P Bio A. i), so that Pı Bro» will be fulfilled. 
Similar to AA-SYNT, we extract a sub-graph G' of G, called the largest 
contract-satisfying sub-graph, whose every path belongs to pref(Aı N A2), and 
vice versa; we omit the construction, which follows usual automata-theoretic 
procedure from the literature [4]. For example, in Ex. 4| the largest contract- 
satisfying sub-graph of the graph in Fig. [Le] is the one that only excludes the 
vertices c and f. It follows that when the tenders strongly fulfill their objectives 
under the contracts, it is guaranteed that every path always remains in G’. 


Theorem 7 (Assume-guarantee decentralized synthesis). Let G = (V, v9, E) 
be a graph, Bı and $5 be a pair of overlapping w-regular objectives, and A, and 
A» be w-regular assumptions. Let G' be the largest contract-satisfying sub-graph 

of G. A pair of robust tenders exist if TUA um (v9) + Tine: (v9) < 1. Moreover, 

AG-SYNT is in PTIME. 


8 Related Work 


Shielding [35] is a framework in which a runtime monitor called a shield enforces 
an unverified policy v (e.g., generated using reinforcement learning [[7]) to satisfy 
a given specification. A shield operates by observing, at each point in time, the 
action proposed by m and can alter it, e.g., if safety is violated. The choice of 
who acts at each point in time, 7 or the shield, can be seen as a scheduling choice 
similar to our setting. However, the goals of the two approaches are different: 
our goal is to design tenders for modular policy synthesis, whereas a shield 
is meant as a verified “wrapper” for a complex policy. Technically, in auction- 
based scheduling, the scheduling depends on the auction which is external to the 
policies, whereas in shielding, it is the shield who chooses whether to override 7. 

In distributed reactive synthesis [42]36]32], the goal is to design a collection 
of Mealy machines whose communication is dictated by a given communication 
architecture. Distributed synthesis is well studied and we point to a number of 
works that considered objectives that are a conjunction $4 ^ 99 A... of sub- 
objectives 1, 82,... [30]21|39|31]18]6]. While there is a conceptual similarity 
between our synthesis of tenders and the synthesis of Mealy machines, there 
is a fundamental difference between the approaches. Namely, our composition 
is based on scheduling, i.e., exactly one policy is scheduled at each point in 
time, whereas in distributed synthesis, the composition of the Mealy machines 
is performed in parallel, i.e., they all read and write at each point in time. 

Our algorithms build upon the rich literature on bidding games on graphs. 
The bidding mechanism that we focus on is called Richman bidding [38]37]10]. 
Other bidding mechanisms have been studied: poorman [11], tarman [13], and 
all-pay [14]15]. Auction-based scheduling can be instantiated with any of these 
mechanisms and the properties from bidding games transfer immediately (which 
differ significantly for quantitative objectives). Of particular interest in practice 
is discrete bidding, in which the granularity of the bids is restricted [2712116]. To 
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the best of our knowledge, beyond our work, non-zero-sum bidding games have 
only been considered in [40]. The solution concept that they consider is subgame 
perfect equilibrium (SPE). While it is suitable to model the interaction between 
selfish agents, we argue that it is less suitable in decentralized synthesis. 

There are many works on designing optimal policies for multi-objective se- 
quential decision making problems for various different system models; see the 
survey by Roijers et al. [43] and works on multi-objective stochastic 
games [20]21]24]. To the best of our knowledge, no prior work considers the de- 
composition of the problem into individual task-dependent policies like us. Auc- 
tions to distribute tasks to agents have been considered 
extensively [29]44[25]28[19]26]HT]. Their goal is very different: their agents bid 
for tasks, that is, a bid represents an agent’s cost (e.g., in terms of resources) 
for performing a task. The auction then allocates the tasks to agents so as to 
minimize the individual costs, giving rise to an efficient global policy. 


9 Conclusion and Future Work 


We present the auction-based scheduling framework. Rather than synthesizing a 
monolithic policy for a conjunction of objectives 4 ^ 95, we synthesize two inde- 
pendent tenders for each of the objectives and compose the tenders at runtime 
using auction-based scheduling. A key advantage of the framework is modu- 
larity; each tender can be synthesized and modified independently. We study 
three instantiations of decentralized synthesis in planning problems with vary- 
ing degree of flexibility and practical usability, and develop algorithms based 
on bidding games. Interestingly, we show that a pair of admissible-winning ten- 
ders always exists in binary graphs for reachability objectives and they can be 
found in PTIME. This positive result illustrates the strength and potential of 
the auction-based scheduling framework. 

'There are plenty of directions of future research and we list a handful. First, 
we consider only qualitative objectives and it is interesting to lift the results 
to quantitative objectives, where one can quantify the fairness achieved by the 
scheduling mechanism in a fine-grained manner. Moreover, it is appealing to 
employ the rich literature on mean-payoff bidding games. Second, we consider a 
conjunction of two objectives, and it is interesting to extend the approach to a 
conjunction of multiple objectives. This will require extending the theory of bid- 
ding games to the multi-player setting, which have not yet been studied. Finally, 
it is particularly interesting to extend the technique of auction-based scheduling 
beyond path-planning problems, for example, it is interesting to consider decen- 
tralized synthesis of controllers that operate in an adversarial or probabilistic 
environment. Again, the corresponding bidding games need to be studied (so far 
only sure winning has been considered for bidding games played on MDPs [12]). 
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Abstract This paper considers the problem of co-synthesis in k-player 
games over a finite graph where each player has an individual w-regular 
specification ¢;. In this context, a secure equilibrium (SE) is a Nash 
equilibrium w.r.t. the lexicographically ordered objectives of each player 
to first satisfy their own specification, and second, to falsify other players’ 
specifications. A winning secure equilibrium (WSE) is an SE strategy 
profile (7; )ie[1,4] that ensures the specification ¢ :- Aj. Qi if no player 
deviates from their strategy mi. Distributed implementations generated 
from a WSE make components act rationally by ensuring that a deviation 
from the WSE strategy profile is immediately punished by a retaliating 
strategy that makes the involved players lose. 

In this paper, we move from deviation punishment in WSE-based imple- 
mentations to a distributed, assume-guarantee based realization of WSE. 
This shift is obtained by generalizing WSE from strategy profiles to speci- 
fication profiles (pi) iei] With Aiii] Pi = $, which we call most general 
winning secure equilibria (GWSE). Such GWSE have the property that 
each player can individually pick a strategy 7; winning for p; (against 
all other players) and all resulting strategy profiles (7;);.[1,,] are guar- 
anteed to be a WSE. The obtained flexibility in players' strategy choices 
can be utilized for robustness and adaptability of local implementations. 
Concretely, our contribution is three-fold: (1) we formalize GWSE for 
k-player games over finite graphs, where each player has an w-regular 
specification ¢;; (2) we devise an iterative semi-algorithm for GWSE 
synthesis in such games, and (3) obtain an exponential-time algorithm 
for GWSE synthesis with parity specifications ¢;. 
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1 Introduction 


Games over graphs provide a well known abstraction for many challenging correct- 
by-construction synthesis problems for software and hardware in embedded cyber- 
physical applications. In particular, the correct-by-construction co-synthesis of 


* Authors are supported by the DFG project 389792660 TRR 248-CPEC. Additionally 


A.-K. Schmuck is supported by the DFG project SCHM 3541/1-1. 


© The Author(s) 2024 
B. Finkbeiner and L. Kovacs (Eds.): TACAS 2024, LNCS 14572, pp. 173-193, 2024. 
https: //doi.org/10.1007/978-3-031-57256-2_9 


174 S. P. Nayak and A. Schmuck 


multiple interacting (reactive) components — each with its own correctness spec- 
ification — poses, as of today, severe challenges in automated system design. 

While many of these challenges arise from the fact that not every component 
has the same information about all relevant variables in the system, even in 
the seemingly simple setting of full information — where all components see the 
valuation to all variables — finding the right balance between centralized and local 
reasoning for co-synthesis is surprisingly challenging. While assuming all players 
to cooperate might demand too much commitment from individual components, 
a fully adversarial setting where all other components are assumed to harm a 
local implementation (independently of their own objective) might not capture 
a realistic scenario either. 

To address this issue, starting with the seminal work of Chatterjee et al. [13], 
the concept of rationality — stemming from classical game theory — was brought 
to graph games in order to formalize a more realistic model for interaction of mul- 
tiple components in co-synthesis. The main conceptual contribution of [I3] was 
the introduction of secure equilibria (SE) — a special sub-class of Nash equilibria 
— given as particular strategy profiles. Intuitively, an SE is a Nash equilibrium 
w.r.t. the lexicographically ordered objectives of each player to first satisfy their 
own specification, and only second, to falsify other players’ specifications. More 
specifically, it is a strategy profile, i.e., a tuple (7;);, with 7; being the strategy 
of Player i, such that no player can improve w.r.t. their lexicographically ordered 
objective by deviating from this strategy. 

As stated by [13] p.68], an SE can thus be interpreted as a contract between 
the players which enforces cooperation: any unilateral selfish deviation by one 
player cannot put the other players at a disadvantage if they follow the SE. 
While this property makes SE very desirable, their main draw-back, as most 
prominently pointed out by [5], is their restriction to a single strategy profile. 
This, in combination with classical reactive synthesis engines typically prefer- 
ring small and goal-oriented strategies, incentivizes “immediate punishment” of 
deviations from an SE strategy profile in the final implementation. 


Motivating Example. To illustrate this effect, let us consider the game de- 
picted in Fig. |1| taken from [13]. Here, an SE can be described as follows: if 
Player 1 always chooses v3 > vı (forming 7) and Player 2 always chooses vo > v2 
and v3 > v3 (forming 73), then they both satisfy their specifications; if Player 1 
deviates by choosing v3 > v» (risking falsification of $5), then Player 2 can retal- 
iate by choosing vg > v4 (ensuring falsification of both ¢;); similarly, if Player 2 
deviates by choosing vo > v3 (risking falsification of $1), then Player 1 retali- 
ate by choosing v3 > v4 (ensuring falsification of both $;). Clearly, the strategy 
profile (71,72) is an SE. It is, in particular, a winning SE as both players sat- 


Figure 1: A two-player game with Player 1’s vertices (squares), Player 2's vertices 
(circles) where Player it’s specification $; = 01; is to visit v; infinitely often. 
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isfy their specifications when following it. However, as the outlined retaliating 
strategies (71,75) are also part of the final implementation generated from this 
SE, any play that deviates from (71,72) only once, makes the game end up in a 
loop at v4 resulting in neither player satisfying their objectives. Intuitively, this 
way of implementing SE-based strategies makes components act rationally by 
ensuring that a deviation from the contract is immediately punished. 

Having the interpretation of an SE as a contract in mind, it is however very 
appealing to think about the realization of this contract in the final implemen- 
tation in a more permissive way. Intuitively, in the game depicted in Fig.|1| both 
players can satisfy their specifications o; without the help by the other player, as 
long as the play does not go to v4. In particular, whenever both players indepen- 
dently chose a strategy 7; which ensures that they (i) never take their edge to v4 
and (ii) satisfy $; for every strategy 7; of the other player that also never takes 
their edge to v4, forms an SE strategy profile (71,75). These minimal coopera- 
tion obligations for an SE can be interpreted as a specification profile (91, p2), 
s.t. pr = VA ^ (V2 => $1) and p2 = V2 ^ (1 => $3), where v, = O-(v3 ^ Ov4) and 
V» = Dl2(v9 ^ Qv4) express the above discussed assumption that Player i does 
not move to v4 from their vertex. It turns out, that this new specification profile 
(1, 2) has three nice properties: (i) it is most general meaning it does not lose 
any cooperative solution, i.e., $1 ^ $» = Y1 ^ $2 , (ii) it is realizable, i.e., Player à 
has a strategy 7; that satisfies y; in a zero-sum sense, (i.e., no matter what the 
other player does) and, most importantly (iii) it is secure (winning), i.e., every 
strategy profile (71,72), where Player i's strategy 7; satisfies y; (in a zero-sum 
sense) is a winning SE. While properties (i) and (iii) motivated us to call the 
set of new specifications a most general winning secure equilibrium (GWSE), 
property (ii) ensures that any specification y; from this tuple is locally and fully 
independently realizable by every component. Conceptually, this allows us to 
move from deviation-punishment in SE-based implementations to a distributed, 
assume-guarantee based realization of SE. 


Contribution. By moving from strategy profiles (WSE) to specification profiles 
(GWSE) for SE realizations, our approach takes the conceptualisation of ratio- 
nality for distributed synthesis to an extreme: as we are in the position to design 
every component (as it is a computer system not a human that actually acts ra- 
tionally) we can enforce that implementations respect the new specifications q;. 
We only use the concept of rationality encoded in WSE to automatically obtain 
meaningful and implementable distributed specifications y; for this co-design 
process. Thereby the implementation of an accompanying punishment mecha- 
nism to enforce rationality of players becomes obsolete. The obtained flexibility 
in players’ strategy choices can be utilized for robustness and adaptability of 
local implementations, which makes GWSE particularly suited for embedded 
systems applications. 

Concretely, our contribution is three-fold: (1) We formalize GWSE for k- 
player games over finite graphs, where each player has an w-regular specification. 
(2) We devise an iterative semi-algorithnT]] for GWSE synthesis under w-regular 


! A semi-algorithm is an algorithm that is not guaranteed to halt on all inputs. 
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specifications. (3) We give a (sound but incomplete) exponential-time algorithm 
for GWSE synthesis under parity specifications. 
Other Related Work. After the introduction of secure equilibria (SE) by Chat- 
terjee et al. [13], there has been several efforts on extending the notion to other 
classes of games, e.g., games with sup, inf, lim sup, lim inf, and mean-payoff 
measures [9], multi-player games with probabilistic transitions [I7] or quanti- 
tative reachability games [8]. Furthermore, a variant of secure equilibria, called 
Doomsday equilibria was studied in [12], where if any coalition of players deviates 
and violates one players' objective, then the objective of every player is violated. 
Moreover, the notion of secure equilibria has been applied effectively in the syn- 
thesis of mutual-exclusion protocols [151] and fair-exchange protocols [21]23]. 
Motivated by similar insights, other concepts of rationality have also been 
introduced in multi-player games, e.g. subgame perfect equilibria [291712811016] 
or rational synthesis [20[22]18]. Similar to the implementations of SE by [13], 
these works restrict implementations to a single strategy profile. In contrast, 
our work introduces a more flexible concept of rationality that is closely re- 
lated to contract-based distributed synthesis, as in [24]19]16][2]. Here, an assume- 
guarantee contract is synthesized, such that every strategy realizing the guaran- 
tee is ensured to win whenever the other players satisfy the assumption. While 
this is conceptually similar to our synthesis of GWSE, these works do not con- 
sider the players to be adversarial, and hence, there is no notion of equilibria. 
'To the best of our knowledge, the only other work that also combines flexibil- 
ity with equilibria is assume-admissible (AA) synthesis [b]. Their work utilizes 
a different, incomparable definition of rationality based on a dominance order. 
Both approaches are incomparable — there exist co-synthesis problems where our 
approach successfully synthesizes a GWSE and no AA contract exists, and vice 
versa (see Ex. |l|for details). Conceptually, AA contracts still require rational 
behaviour of players within the contract, while our approach only uses rational- 
ity as a concept to synthesize meaningful local specifications which can then be 
implemented in an arbitrary (non-rational) manner. We believe that this is a 
superior strength of our approach compared to AA synthesis. 


2 Preliminaries 


Notation. We use N to denote the set of natural numbers including zero. Given 
a,b € N with a < b, we use [a;b] to denote the set {r—eN|a<n<b}. For any 
given set [a;b], we write i €eyen [a;b] and $4 €oqa [a;b] as short hand for i € 
[a;b] n {0,2,4,...} and à e [a;6] n {1,3,5,...} respectively. For a finite alphabet 
X, X* and X" denote the set of finite and infinite words over X, respectively. 
Linear Temporal Logic (LTL). Given a finite set AP of atomic propositions, 
linear temporal logic (LTL) formulas over AP are defined by the grammar: 


ó$-pcAP[óvóo]|-ó|Oo0| oU o, 


where v, =, ©, and U denotes the operators disjunction, negation, next, and 
until, respectively. Furthermore, we use the usual derived operators, True = pv-p, 
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False = True, conjunction pag = a(Adv ^d), implication 6 > ¢' = ^ov ¢', and 
other temporal operators such as finally 09 = True U ¢ and globally O¢ = 40-9. 
The semantics of LTL formulas are defined as usual (see standard textbooks [3]). 


Game Graphs. A k-player (turn-based) game graph is a tuple G = (V, E,vo) 
where (V, E,vo) is a finite directed graph with vertices V and edges E, and 
vo € V is an initial vertex. For such a game graph, let P = [1;k] be the set of 
players such that V = Ujep V; is partioned into vertices of k players in P. We write 
Ej, i € P, to denote the edges from Player i's vertices, i.e., E; = En (Vi x V). 
Further, we write V.; and E..; to denote the set U;.; Vj and U;4; Ej, respectively. 
A play from a vertex uo is a finite or infinite sequence of vertices p = uot... 
with (uj, uj+1) € E for all j > 0. 

Specifications. Given a game graph G, we consider specifications specified us- 
ing a LTL formula 9 over the vertex set V, that is, we consider LTL formulas 
whose atomic propositions are sets of vertices V. In this case the set of desired 
infinite plays is given by the semantics of ¢ over G, which is an w-regular lan- 
guage L(G, 9) € V”. We just write L(¢) to denote this language when the game 
graph G is clear in the context. Every game graph with an arbitrary w-regular 
set of desired infinite plays can be reduced to a game graph (possibly with an 
extended set of vertices) with an LTL objective, as above. The standard defini- 
tions of w-regular languages are omitted for brevity and can be found in standard 
textbooks [3]. To simplify notation we use e = (u,v) in LTL formulas as syntactic 
sugar for u ^ Qv. 


Games and Strategies. A k-player game is a pair G = (G,(¢;)iep) where G is 
a k-player game graph and each ¢; is an objective for Player i over G. A strategy 
of Player i, i € P, is a function 7;:V*V; > V such that for every pv e V*V;, it 
holds that (v, m;(pv)) € E. A strategy profile for a set of players P’ C P is a tuple 
II = (n;)iep of strategies, one for each player in P’. To simplify notation, we 
write P_; and 7_; to denote the set P \ {i} and their strategy profile (7) jeg. (iy; 
respectively. Given a strategy profile (v;);ee, we say that a play p = uoua... is 
(vi)iee -play if for every i € P’ and for all Z > 1, it holds that ug- € V; implies 
ug = T; (uo... Ue-1). 

Satisfying Specifications. Given a game graph G and a specification ¢, a 
play p satisfies ọ if p € C(Q). A strategy profile (z;);ee satisfies/winning w.r.t. 
a specification ¢, from a vertex v, denoted by (7;);ee Fy $, if every (7;)iep-play 
from v satifies ¢. We just write (m; Jie & o if v is the initial vertex. We collect all 
vertices from which there exists a strategy profile for players in P’ that satisfies o 
in the winning regio] (P)(G, 2). We just write (P’))¢ to denote this set if game 
graph G is clear in the context. Furthermore, we write $..; to denote Ajep_, Qj- 

Parity Specifications. Give a game graph G = (V, E,vo), a specification ¢ 
is called parity if ¢ = Parity(2) = Aie, t0] (002; > Vjesenfitt;a]00.2;), with 
2; = (v € V | Q(v) = i} for some priority function 2 : V ^ [0;d] that assigns 
each vertex a priority. A play satisfies such a specification if the maximum of 
priorities seen infinitely often is even. 


? Slightly abusing notation, we write (i) 9 for singleton sets of players P' = {i}. 
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3 Most General Winning Secure Equilibria 


This section formalizes most general winning secure equilibria (GWSE). In order 
to do so, we first recall the notion of secure equilibria from [I3]. 

Secure Equilibria. Given a k-player game G = (G, (i)i) and a strategy 
profile IT := (7;)iep one can define a payoff profile, denoted by payoff(IT), as the 
tuple (p;)iep s.t. p; = 1 iff II | à;. With this, we can define a Player j preference 
order «; on payoff profiles lexicographiacally, s.t. 


(pi)ieo <j (pi)iee iff (pj < n5) v ((p; = p4) ^ (Vi + j.pi 2 Di) ^ (Bi + j.pi > pi) ). 
Intuitively, this preference order captures the fact that every player's main ob- 


jective is to satisfy their own specification $;, and, as a secondary objective, 
falsify the specifications of the other players. 


Definition 1. Given a k-player game G = (G, (i)i), a strategy profile II := 
(mi )iep is a secure equilibrium (SE) if for all i € P, there does not exist a strategy 
v; of Player i such that payoff(II) <; payoff(v;, m-i). 


It is well known that every secure equilibrium is also a nash equilibrium in 
the classical sense. Within this paper, we only consider winning secure equilibria 
(WSE) i.e., SE with the payoff profile (p; = 1)iep. As WSE have a trivial payoff 
profile, they can be characterized without referring to payoffs as formalized next. 


Definition 2. Give a k-player game (G,(¢;)iep), a winning secure equilibrium 
(WSE) is a strategy profile (7;)iep such that (i) (Tiji F Aiep Gi; and (ii) for 
every strategy v; of Player i, if (m;,n—i) # 6-4 holds, then (ni, n-i) # $i holds. 


Intuitively, item [i]ensures that the strategy profile satisfies all player's objective, 
whereas item [ii] ensures that no player can improve, i.e., falsify another player's 
objective without falsifying their own objective, by deviating from the prescribed 
strategy. 

Most General Winning Secure Equilibria. As illustrated by the motivating 
example in Sec. |1| we aim at generalizing WSE from single strategy profiles to 
specification profiles that capture an infinite number of WSE. These specification 
profiles (y;)iep, which we call most general winning secure equilibria (GWSE), 
allow each player to locally (and fully independently) pick a strategy 7; that is 
winning for y; (in a zero-sum sense). It is then guaranteed that any resulting 
strategy profile (7;);ee is indeed a WSE. This is formalized next. 


Definition 3. Give a k-player game (G, (¢i) ier), a tuple (vi)iep of specifications 
is said to be a most general winning secure equilibrium (GWSE) if it is 

(i) (most) general: L(Nie vi) = L(Nier Qi); 

(ii) realizable: vo € (i) yp; for all i € P; and 
(iii) secure (winning): every strategy profile (v;);ep with x; ^ pi is a WSE. 
Intuitively, generality ensures that the transformation of the specifications (;) iep 
into new specifications (y;)iep does not lose any winning play. Further, realiz- 
ability ensures that every single player can enforce y; (without the help of other 
players) from the initial vertex. Finally, security ensures that any locally chosen 
strategy 7; winning for v; fors a strategy profile which is indeed a WSE. 
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4 Computing GWSE in w-regular Games 


This section proposes an iterative semi-algorithn | to compute GWSE in this 
paper which utilizes the concept of adequately permissive assumptions (APA) 
introduced by Anand et al. [1]. Given a k-player game (G,(¢;)iep), an APA is a 
specification v; that collects all Player i strategies which allow for a cooperative 
solution if other players cooperate. It therefore overapproximates the set of all 
Player i strategies which could possibly form a WSE with the other players. As 
a consequence, the intersection Niep V; is an overapproximation of a GWSE. In 
order to refine this approximation, the next computation round can now use the 
APA’s of other players when computing new local APA's. In order to properly 
formalize this idea, we first recall the concept of APA’s from [1]. 


4.1 Adequately Permissive Assumptions 
Following [I], we define an adequately permissive assumption (APA) as follows. 


Definition 4. Given a k-player game graph G = (V, E, vo) and a specification à, 
we say that a specification i); is an adequately permissive assumption (APA) on 
Player à for @ if it is: 


(i) sufficient: there exists a strategy profile m_; such that for every Player i 
strategy v; with v; = pi, we have (i, n.i) E à; 
(ii) implementable: (2); = V; and 
(iti) permissive: £(v;) 2 L(¢). 


The intuition behind an APA is that even if a player can not realize a spec- 
ification ¢, they should at least satisfy an APA on them as it will allow them 
to realize ¢ if the other players are willing to help (sufficiency). Further, such a 
behavior by Player i does not prevent any WSE (permissiveness), and Player i 
can individually choose to follow an APA (implementability). 


Remark 1. While Def. [A] is an almost direct adaptation from [I] Def. 2-5] to k- 
player games, it has a couple of noteable differences. First, Anand et al. define 
APA’s for 2-player games and, conceptually, use APA’s to constraint the oppo- 
nents moves. While we can simply view the k-player game as a 2-player game 
between the protagonist Player i and (the collection of) its opponents P_;, we 
will use the computed assumption wv; to constrain the protagonist’s moves (not 
the opponent) in Def. 4| Second, the sufficiency condition for an APA in [1] Def. 
2] does not depend on an initial vertex. An APA always exists in their setting 
(possibly being True when (P)ó = 2). In contrast, the k-player games in this 
paper have a designated initial vertex, hence, an APA only exists iff up € (P) 9. 


With this insight, we can use the algorithm from [I] to compute APA’s for 
parity specificatios ¢ = Parity(Q) in polynomial time. 


3 A semi-algorithm is an algorithm that is not guaranteed to halt on all inputs. 
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Lemma 1 ([1j Thm. 4]). Given a k-player game graph G = (V, E,vo) and a 
parity specification à = Parity(Q), an APA on Player i for ó can be computed, 
: ; er 4 

if one exists, in time O(|V| ). 


Let us write COMPUTEAPA(G, ¢,7) to denote the procedure that returns this 
APA if it exists; otherwise, it returns False. 


Remark 2. We note that Lem.|l|also gives a method to compute APA's for 
games with LTL- or w-regular specifications as such games can be converted 
into parity games (possibly with an extended game graph) by standard meth- 
ods [3]. Therefore, with a slight abuse of notation, we will also call the algorithm 
COMPUTEAPA(G, à, 1) if ó is not a parity specification, which the understand- 
ing, that the game is always converted into a parity game first. This might incur 
an exponential blowup of the state space. As we call COMPUTEAPA repeatedly 
to compute GWSE's, this blowup might cause non-termination (see Sec. [4.6] for 
details). In order to obtain a (non-optimal but) terminating algorithm for GWSE 
computation, we will mitigate this blowup later in Sec. 


4.2 Iterative Computation of APA's 


Given the results of the previous section, we can use the algorithm COMPUTEAPA 
on a given game (G,(¢i)ip) to compute APA’s for each player, i.e., 
V; += COMPUTEAPA(G, ¢;,7). Intuitively, v; overapproximates the set of all 
Player i strategies which could possibly form a WSE with the other players. As 
a consequence, the intersection Niep Yi is an overapproximation of the GWSE. 
As outlined previously, we will iteratively refine these computed APA’s to 
finally compute the GWSE. In order to do so, we want to condition the compu- 
tation of the next-round APA w; on the previous-round APA’s of all other players 
w-i, as any secure strategy of players in P_; is incentivized to comply with Y-i. 
The most intuitive method to do this is to simply consider ~_; = $; as the spec- 
ification for APA computation in the next round. However, the way sufficiency 
is formulated for APA’s prevents this approach, as the implication w_; > 4$; is 
true if ~_; is false. As there obviously exists a strategy profile 7_; which violates 
w-i, the sufficiency condition becomes meaningless for this specification. 
However, as we know that V; are APA’s, their implementability constraint 
(Def. [4]ii) ensures that Player i can neither enforce nor falsify them. Therefore, a 
new specification $^ := w_;A¢@;, still puts all the burden of satisfying v; to players 
in P. ; and hence, implicitly constrains the choices of P_; to strategies complying 
with V; for sufficiency of the new APA. However, using $j := Y-i ^ ¢; indeed 
weakens the permissiveness requirement L(Y) 2 £($ ^ Y-i), i.e., the new APA 
V; needs to be more general than the specification ¢, only when the assump- 
tion w_; holds. With these refined conditions for sufficiency and permissiveness, 
it becomes evident that an APA for specification @ under assumption w.; is 
equivalent to an APA for the modified specification w_;A@, as formalized below. 


Definition 5. Given a k-player game, a specification ó; and an assumption Y-i, 
we say that the specification p; is an APA on Player i for ¢; under v; if it is 
an APA on Player i for specification p_i ^Q. 
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Following Rem. |2} we denote by COMPUTEAPA(G,wW_; ^ ¢,i) the algorithm 
which computes APA’s on Player i for ¢ under assumptions w_;, even though 
v; ^ @ is typically not a parity specification over G anymore. 


4.3 Computing GWSE 


Using all the intuition discussed before, we now give a semi-algorithm in Algo. 
to compute GWSE for k-player games with w-regular specifications for all play- 
ers. The main idea is to iteratively compute assumptions (v;;);ep on every player 
and check if they are stable enough so that every player can satisfy their actual 
specification ó; under the assumption v.;. If not, then, in the next iteration, 
we compute new assumptions (vj);ee that are stricter than earlier ones, i.e., 
L(Y) € £(v;) but still more general than their specifications under the earlier 
assumption, i.e., C(v;) 2 £(V i ^ $i). 


Algorithm 1 COMPUTEGE(G) 


Require: A k-player game G with game graph G = (V, E,vo) and parity specifications 
(di) ier. 
Ensure: Either a GWSE (¢;)iep or False. 
1: vj — True VieP 
2: return RECURSIVEGE(G, (vi )ier) 


3: procedure RECURSIVEGE(G, (qi) ier) 

4 Qi € Vi ^ (Y-i — $i) VieP 

5 if vo € ie (i); then 

6: return (qi); 

T: i; pi ^ COMPUTEAPA(G, Y-i A Qi i) Wie P 
8: if Y; = v; for all į € P then 

9 return False 

0 


10: return RECURSIVEGE(G, (Y; )ier) 


More specifically, we start with w; = True for each à € P in the first it- 
eration (line 5h, and then in every iteration, we want each player to satisfy 
Qi = Vi ^ (Y-i = $i) (computed in line by themselves, i.e., always satisfy 
their assumption v; and satisfy specification ¢; whenever others satisfy their 
assumptions w_;. Note that, in this part of the algorithm it is correct to use this 
implication-style specification, as it is used for solving a zero-sum 2-player game 
between Player i and its opponent (i.e., the collection of all other players in 
P_,) for the specification y;. The winning regions (2); for each such zero-sum 
2-player game are then intersected in line [5] to obtain the winning region that 
is achievable by any strategy profile (7;);e» where 7; is a winning strategy of 
Player à w.r.t. v; (in a zero-sum sense). If this resulting winning region contains 
the initial vertex, we return the specification (pi)iep (line [6], which is proven to 
indeed be a GWSE in Thm. 

If this is not the case, we keep on strengthening APA's, as discussed in 
Sec.[4.2] to make the above mentioned zero-sum 2-player games easier to solve (as 
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they can rely on tighter assumptions now). Hence, we call COMPUTEAPA with 
the modified specifications $/ := 9 ; ^ 9j for all players (line[7). If this assumption 
refinement step was unsuccessful, i.e., assumptions have not changed (line E). we 
give up and return False. Otherwise, we recheck the termination condition for 
the newly computed APA's. 


Example 1. Before proving the correctness of the (semi) Algo. |l| let us first 
illustrate the steps using an example depicted in Fig. [2] In line [I| we begin with 
V4 7 V» = True and run the recursive procedure RECURSIVEGE in line] 

Within the first iteration of RECURSIVEGE, in line we set oj = $i as 
pi = True for all i € [1; 2]. Then, in line] we check whether each player can satisfy 
(9; = pi without cooperation (i.e., in a zero-sum sense), from the initial vertex vo. 
As no player can ensure that, we move to line [7] Here, as a; = True for i € [1;2], 
the new assumptions 4// is an APA computed by COMPUTEAPA(G, $;, i). This 
gives us | = O7(e12 ^ €34) A O 5e19 and v5 = 0 O eoo, where eij = vi ^ Ovj. 
Intuitively, V ensures that edges, i.e., vı > vg and v3 > v4, leading to the region 
from which it is not possible to satisfy 6; are never taken; and the edge, i.e., 
vı > Uo, restricting the play to progress towards target vertex vs (as in $1) is 
eventually not taken. Similarly, v9 ensures that the edge vo > vo is eventually 
not taken that ensures progress towards $»'s target vertices (v4, vs]. As Wi + vii 
for all i € [1;2] in line [S] we go to the next iteration of RECURSIVEGE. 

In the second iteration, we again compute the new potential GWSE (1,2) 
with y; = V; ^ (Y-i = ¢i) in line H4] In line |5| we find that vo ¢ (1))y1. That is 
because Player 1 cannot ensure satisfying $1 even when Player 2 satisfies 1/2 as 
Player 2 can always use edge vo > vs leading to the play (vov3)” £ à». Hence, 
in line|7| the APA under v, gives a more restricted assumptions on Player 2: 
wh = 0O (eoo ^ €o3). As the assumption v» on Player 2 was very weak, the APA 
for Player 1 under %2 results in the same assumption as V, and hence, Y1 = V. 
Then, we move to the third iteration. 

In this iteration, we find that both players can indeed satisfy their new 
specification y; from the initial vertex in line Hence, we finally return a 
GWSE (1,2) with Qi = Vi ^ (Yi > $i) where we = QO =(€00 A €03) and 
V1 = OR=(e12 A €34) AOD -(e10). 


Remark 3. Let us remark that for the game depicted in Fig. B] assume-admissible 
(AA) synthesis [5] has no solution. AA-synthesis utilizes a different, incompara- 
ble definition of rationality based on a dominance order. In their framework, a 


Ö—E Oc É 


Figure 2: A two-player game with initial vertex vo, Player 1’s vertices (squares), 
Player 2's vertices (circles) and specifications $1 = 0n {vs} and à» = 00 {v4, us}. 
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Player i strategy 7; is said to be dominated by m; if the set of strategy profiles 
that 7’ is winning against (i.e., satisfies Player it’s specification) is strictly larger 
than that of m. A strategy not dominated by any other strategy is called admis- 
sible. In AA-synthesis, one needs to find an admissible strategy m; for Player i 
such that for every admissible strategy r”; for the other player, (75,7 ;) = Qi. 
In this example, Player 1 has only one admissible strategy mı that always uses 
vı > vs and v3 > vg. However, with the admissible strategy 75 of Player 2 that 
always uses vo > vs, we have (71,75) # $1. 


The next theorem shows that Algo. [1]is indeed sound. 


Theorem 1. Let G be a k-player game with game graph G = (V, E,vg) and 
parity specifications (d;)iep such that (p} Jio = COMPUTEGE(G), then (qj Jier 
is a GWSE for G. 


Proof. First, observe that COMPUTEGE did not return False by the premise 
of the theorem. So, if COMPUTEAPA returned False in line |7| i.e., v; = False 
for some i € P, in some n-th iteration, then in the n + 1-th iteration, we have 
pi = False and v. ; = False for all j € P_;. So, it holds that vo £ (i) v; = (?) False = 2 
and hence, it does not return in line [6] Furthermore, as v; ^ $; = False for 
all j € P_;, by sufficiency, COMPUTEAPA returns False for all j € P_;. Hence, 
y; = False for all j € P. This would imply (by similar arguments), in (n + 2)-th 
iteration, 1/5 = Y; = False for all j € P and hence, the algorithm would return False. 
Therefore, we can assume COMPUTEAPA never returned False in any iteration. 
Now, let us claim that in every iteration of RECURSIVEGE, for all i € P: 


(claim 1) L(y) 2 £(Ajee $j), and (claim 2) L(u;) 2 L(Y- ^ di). 


We will prove the claim using induction on the number of itereative calls to 
RECURSIVEGE. For the base case, observe wv; = True for all à € P, hence, the 
claim holds trivially. For the induction step, assume that claim 14-2 hold in the 
n-th iteration. Then, for all 4 € P, as v (computed in line |7) is w in the next 
iteration, it suffices to show that £(v;) 2 C(Aje» gj) and L(Y) 2 L(t, ^ $i). 

By permissiveness of APA (as in Def. 4), for all i € P, we have 
L(COMPUTEAPA(G, Y-i ^ $;,1)) 2 L(Y) n £(ó;). Hence, by line [7] for all 
ie P, we have L(y) 2 L(Y) n £(v-,) n £(9) = (Nje £(U;)) n £(6;), and hence, 
by claim 1, £(U]) 2 £(Aje $). 

Similarly, for all i € P, as L(4%;) 2 £(vi) n £(v-;) n £(à;), by claim 2, we 
also have £(v;) 2 £(v-;) n £(o;). Furthermore, by line |7| for all j € P, we 
have £(i;) 2 £(//), and hence, LCb) = yar L(y) 2 ya LWW) = LOW). 
Therefore, for all i € P, we have £(v;) 2 £(';) n £(9i) = LW, ^ Gi). 

Now, we show that Def. |3| (1)-(ii) indeed holds for the tuple («7 );ep. 

(i) (general) By construction, p} = V; ^ (Y-i = $i) for the specifications (qj) iep 
computed in last iteration. Hence, it holds that 


(Avi) = [ £C A (Y-i => 4)) = (120) n [ew => $i) 


ieP 


-QLUDQ E= 6 = Qe ava 6) L(A) - (A. 


ieP ieP ieP ieP ieP 
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For the other direction, it holds that 
L(g; ) = Lpi) ^ L-i => Qi) 2 L(Y) NL( i) (1) 


Then, by claim 1, for all i € P, we have L(y; ) 2 £ (Aie i), and hence, £ (Ajep 9; ) 2 
L (Nie di). Therefore, (7) iep is general. 

(ii) (realizable) Holds trivially by line 

(iii) (secure) Let (7;)iep be a strategy profile with c; = y;. Then, every (7;)iep- 
play from vp satisfies o? for all i € P, and hence, (7;)ier E Niep 9; - SO, by gener- 
ality, we have (Ti)iep E Aiep Qi. 

Now, to prove item [ii] of Def. |2| let 7; be a strategy of Player i, and let p be 
the (75, 7.)-play from vo. As before, for all j € P, we have p} = V; ^(U-; > ¢;). 
So, for every j + i, pe L(y) € £(v;). Hence, we have p € Nji L(Y) = £(v-.). 

Now, if p € £(¢;), then p € C( ; ^ $;). Then, by and claim 2, we have p € 
L(y; ). Furthermore, as 7_; = *;, we have p € C(q*;). Therefore, p € L(y; ^v*;), 
and by generality, p € £L(¢; ^ $-;) € L£(¢_;). Then, by contraposition, item [ii] of 
Def. [2] holds for (7;)iep- Hence, (7;);ep is an SE, and hence, (y;)j;ep is secure. à 


4.4 Games with an Environment Player 


Up to this point, we have only considered games played between k players, each 
representing a distinct system. However, in the context of reactive synthesis 
problems, a different setup is often encountered. Here, the system players play 
against an environment player, who is considered as being adversarial toward all 
the system players. Consequently, the system players must fulfill their objectives 
against all possible strategies employed by the environment player. 

Interestingly, this framework can be seen as equivalent to a (k + 1)-player 
game with the original k system players and a (k+1)-th player, representing the 
environment. For this new player, the objective is simply $44; = True. Then, it 
is easy to see that an APA for such specification $,,; under any assumption is 
True. Hence, in each iteration of RECURSIVEGE in Algo. |1| the associated as- 
sumption v1 is also True, and thus, pk+1 = True^ ((Aiqi;;] Yi) => True) = True. 
Consequently, if COMPUTEGE yields a GWSE (v; Jie[1;k+1], the new objective of 
the environment player, y;,,, = True, doesn't impose any constraints on the envi- 
ronment's actions. Therefore, the tuple (v; );«[1,;] remains secure (as in Def. 
for the k system players because the environment player can never violate its 
new specification Yķ+1. In sum, games featuring an environment player can be 
effectively handled as a special case, as formally summarized below: 


Corollary 1. Let G = (V, E) be a game graph with k system players, i.e., P = 
[1;k], and an environment player env such that V = (Uiep Vi) 9 Venv- Let (Gi) ier 
be the tuple of specifications, one for each system player. Then, a tuple (qi)iep is 
a GWSE for (G,(¢i)iep) if and only if (oi)iepkei] with peri = True is a GWSE 
for the k + 1-player game (G, (ói)iepia]) with boi = True. 


Furthermore, in synthesis problems, the choices of the environment are some- 
times restricted based on a certain assumption deny. In such scenarios, a viable 
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approach involves updating each system player's specification $; to deny > Qi 
and subsequently utilizing Cor. Eto compute a GWSE. An alternative approach 
is to consider a (k+ 1)-player game with specification @k+1 = env for the (k+ 1)- 
th player. With this approach, the solution becomes more meaningful, as any 
strategy profile for the system players satisfying the resulting GWSE allows the 
environment to satisfy its own assumptions env. This approach nicely comple- 
ments existing works [14[25] that aim to synthesize strategies for systems while 
allowing the environment to fulfill its own requirement. 


4.5 Partially Winning GWSE 


In the preceding sections, we have presented a method for computing winning 
SE, i.e., equilibria where all players satisfy their objectives. However, it's worth 
noting that in certain scenarios, WSE might not exist (see e.g. [13] for a detailed 
discussion). In such cases, a subset P' of players can still form a coalition, which 
serves their interests by enabling them to compute a GWSE for their coalition 
only, while treating the remaining players in P \ P’ as part of the environment. 
This can be accomplished by computing a GWSE with updated specifications 
denoted as (i)i, wherein ¢; = ¢; for all i € P' and ¢ = True for all i £ P”. This 
scenario aligns with the concept of considering an environment from Sec. 
It is important to emphasize that for instances where no WSE exists, there 
might not even exist a unique maximal outcome for which an SE is feasible, 
see [13] Sec. 5] for a simple example. As a result, there may be multiple coalitions 
that can offer different advantages to individual players from the initial vertex. 
This scenario presents an intriguing, unexplored challenge for future research. 


4.6 Computational Tractability and Termination 


While Algo. [1] has multiple desirable properties, additionally supported by the 
possible extensions discussed in Sec. [4-4]and its computational tractability 
and termination is questionable for the full class of w-regular games. 

As pointed out in Rem. |2| the application of COMPUTEAPA might require 
changing the game graph for if the input is not a parity specification. While 
the language of the computed APA is guarantee to shrink in every iteration (see 
the proof of Thm. 1). this does not guarantee termination of Algo. [Jas such a 
language still contains an infinite number of words. Due to the possibly repeated 
changes in the game graph for APA computation, the finiteness of the underlying 
model can also not be used as a termation argument. 

In addition, the need to change game graphs induces a severe computational 
burden. While this might be not so obvious for the polynomial time algorithm 
ComMPuTEAPA, this is actually also the case for the (zero-sum) game solver that 
needs to be invoked line [5] of Algo. |1| As the specification for these games also 
keeps changing in each iteration, a new parity game needs to be constructed in 
each iteration, which might be increasingly harder to solve, depending on the 
nature of the added assumptions. We will see in Sec. [5] how these problems can 
be resolved by a suitable restriction of the considered assumption class. 
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5 Optimized Computation of GWSE in Parity Games 


As discussed in Sec. the potential need to repeatedly change game graphs in 
the computations of lines [land [7]in Algo. [might incur increasing computational 
costs and prevents a termination guarantee. To circumvent these problems, this 
section proposes a different algorithm for GWSE synthesis which overapproxi- 
mates APA’s by a simpler assumption class, called UCA’s. The resulting algo- 
rithm is computationally more tractable and ensured to terminate. Nevertheless, 
unlike the semi-algorithm discussed in the previous section, this algorithm may 
not be able to compute a GWSE in all scenarios where the semi-algorithm can. 


5.1 From APA's to UCA’s 


One of the main features of APA’s on Player i computed by COMPUTEAPA from 
[I], is the fact that they can be expressed by well structured templates using 
Player i's edges, namely unsafe-edge-, colive-edge-, and (conditional)-live-group- 
templates. Unsafe- and colive-edge-templates are structurally very simple. Given 
a set of unsafe edges S € E; and colive edges C € E; the respective assumption 
templates Wunsarn(S) := Aces One and Woouve(C) := Aecc 0 O 5e simply assert 
that unsafe (resp. colive) edges should never (resp. only finitely often) be taken. 
We call an assumption which can be expressed by these two types of templates 
an Unsafe- and Colive-edge-template Assumption (UCA), as defined next. 


Definition 6. Given a k-player game graph G = (V, E), a specification W is 
called an unsafe- and colive-edge-template assumption (UCA) for Player i, if 
there exist sets S,C € Ej s.t.  :- Wunsare(S) ^ Voouve(C). We write V U9C] to 
denote such assumptions. 


It was recently shown by Schmuck et al. [27] that two-player (zero-sum) parity 
games under UCA assumptions, i.e., games (G, Y = $) where w is an UCA and 
¢ is a parity specification over G, can be directly solved over G without compu- 
tational overhead, compared to the non-augmented version (G, 9) of the same 
game. Interestingly, the synthesis problem under assumptions becomes prove- 
ably harder if live-group-templates conn are needed to express an assumption, 
requiring a change of the game graph in most cases. Conditional-live-group- 
templates wWconp, are structurally more challenging than UCA’s, as they impose 
a Streett-type fairness conditions on edges in G (see [I] Sec.4] for details). 

Motivated by this result, we will restrict the assumption class used for GWSE 
computation to UCA's in this section. Unfortunately, UCA’s are typically not 
expressive enough to capture APA’s for parity games. This follows from one 
of the main results of Anand et al, which shows that APA's computed by 
COMPUTEAPA for parity games are expressible by a conjunctions of all three 
template types, as re-stated in the following proposition. 


Proposition 1 ([1] Thm. 3]). Given the premisses of Lem. |1| the APA com- 
puted by COMPUTEAPA on Player i can be written as the conjunction  :- 


Wunsarn(S) ^ Vcourve(C ) ^ coup where S,C € Ej. 
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We therefore need to overapproximate APA’s by UCA’s, by simply dropping 
the Vcoxp-term from their defining conjunction, as formalized next. 


Definition 7. Given the premisses of Lem.|1\ let v» := COMPUTEA PA(G, ¢,/) = 
Wonsare(S) ^ Weorve(C) ^ Vcoxp. Then we denote by APPROXAPA(G,¢,i) the 
algorithm that computes °°] by first executing COMPUTEAPA(G,¢,i) and 
then dropping all Wconp-terms from the resulting APA. 


It is easy to see that L(Y) € L(y). Therefore, it also follows that 4905€] 
is implementable and permissive (i.e., Def. [4 ii) and (iii) holds). Unfortunatly, 
wlS-C] is in general no longer sufficient (i.e., Def. lati) does not necessarily hold). 
As the proof of Thm. [only uses permissiveness of APA, even though sufficiency 
is lost for UCA's, replacing COMPUTEAPA by APPROXAPA in Algo. |1| does 
not mitigate soundness, i.e., whenever COMPUTEGE terminates in line [6| with 
a specification profile (y;)jep, this profile is indeed a GWSE, even if APA’s are 
over-approximated by UCA’s. This is formalized next. 


Theorem 2. Let ACOMPUTEGE be the algorithm obtained by replacing proce- 
dure COMPUTEAPA by APPROXAPA in Algo.|1| Then, given a k-player game 
G with parity specifications such that (y;)iep = ACOMPUTEGE(G), the tuple 
(vi)iee is a GWSE for G. 


The rest of this section will now show how the restriction to UCA’s allows 
to execute lines b]and [7]in Algo. [1]efficiently and allows to prove termination of 
the resulting algorithm for GWSE computation. 


5.2 Iterative Computation of UCA’s 


We have seen in the previous section that UCA's can be computed by utilizing 
COMPUTEAPA and dropping all Yconn terms (called APPROXAPA). Of course, 
this can be done in every iteration of COMPUTEGE. However, COMPUTEAPA 
expects a party game as an input, and from the second iteration of COMPUTEGE 
onward the input to COMPUTEAPA is given by (G, Y-i ^ Qi, 1), where v.; is an 
assumption on players in P. ;, which is not necessarily a parity game. 

This section therefore provides a new algorithm, called CoMPUTEUCA and 
given in Algo. [2] which computes UCA’s for Player i directly on the game graph 
G for games (G,w ^ à) where y = l%Cl is an UCA for P_; with unsafe edges 
S € E; and colive edges C € E j, and ¢ is a parity specification, both over G. 
Intuitively, COMPUTEUCA first slightly modifies G to a new two-player game 
graph G (lines |1| and s.t. the specification Y ^ o can be directly expressed 
as a parity specification db on G (line 4). This allows to apply APPROXAPA to 
construct and return an UCA for Player 1 on G (line [5]. As the resulting UCA 
is for Player i, the unsafe edge and colive edge sets are subsets of E;. Further, 
due to the mild modifications from G to G, the edges of Player i are retained in 
G as E4, hence, the resulting UCA is a well-defined UCA for Player i in G. 

We have the following soundness result for showing equivalence between the 
UCA's computed by CoMPUTEUCA and APPROXAPA for UCA assumptions, 
proven in extended version of this paper [26] App. A]. 
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Algorithm 2 CoMPUTEUCA(G, l5 €l A 9, i) 
Require: A k-player game graph G = (V, E,vo) and specification y% ^ ¢ with UCA 
w= yl] for Pj, i.e., S,C € E, and ¢ = Parity(Q) s.t. 2: V > [0;2d+ 1]. 
Ensure: An UCA pls’ e l for Player i. 
1: Y< Vi and V2 — Vi wC 
2: Ey — E; and É - Eix (Su C)u {(u,0), (cv) | c= (u,v) eC) 
2- o ifveV 


2d+1 otherwise. 


4: G- (VA e Vo, É w Êz, v9); à < Parity(22) 
5: return APPROXAPA(G, 9, 1) 


Proposition 2. Given game graph G = (V, E,vo) with parity specification $ 
and an UCA p = 49€] for P. i, let V! :- APPROXAPA(G, v ^ ¢,i) and 4" :- 
COMPUTEUCA(G,15 ^ $i) then L(y’) = L(Y"). Furthermore, COMPUTEUCA 
terminates in time O((|V|  |E|)4). 


The proof of this result is given in extended version [26] App. Al, and essen- 
tially relies on the observation that the parity specification ĝi in Ĝ expresses the 
language L(w ^ @) when restricted to V, i.e, L(G,¢)|v = L(G, ^ v) and the 
fact that every UCA for Player 1 in Ĝ is also an UCA for Player i in G. 

The usefulness of expressing the computed assumptions as unsafe and colive 
edge sets S, C over the input game graph G is that there are only a finite number 
of edges in that graph. Therefore, there obviously also exists only a finite number 
of unsafe or colive edge sets, which could all be enumerated in the worst case. 
Therefore, computing UCA's on the same game graph in every iteration, will 
ensure termination of the overall computation of GWSE. 


5.3 Solving Parity Games under UCA's 


As the final step towards an optimized version of Algo. |l| we now address the 
computations required in line [5] of Algo. [I| Observe that this line requires to 
check vo € fep (i); for pi = Wi ^ (Y-i = 9). If this check returns True the 
algorithm terminates, if it returns False new assumptions are computed. In both 
cases, the game graph used to check this conditional will not have any effect on 
the future behavior of the algorithm. 

Nevertheless, we utilize the recent result by Schmuck et al. [27] to compute 
(i))y; more efficiently if Y; and Y-i; are UCA's on Player i and P_,, respectively. 
The construction uses the same idea as presented in Algo. [to encode UCA's into 
a new, slightly modified two-player parity game (G i à) which can then be solved 
by a standard parity solver, such as Zielonka's algorithm [30], which return the 
winning region W of Player 1 in this new game that corresponds to the winning 
region of Player i in G. The resulting algorithm is called COMPUTEWIN given 
in the extended version [26] Algo. 3] and has the property that vo € (2) (G, p) if 
and only if vo € W. This is formalized and proven in the extended version [26] 
Prop. 3]. 
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5.4 Computation of GWSE via UCA’s 


With the previously discussed algorithms in place, we are now in the position to 
propose an optimized, surely terminating algorithm to compute GWSE, called 
OCOMPUTEGE. Within COMPUTEGE the recursive procedure RECURSIVEGE 
is replaced by one which uses the algorithms CoMPUTEU CA and COMPUTEWIN 
for UCA’s from Sec. [5.2]and as follows 
1: procedure RECURSIVEGE(G, (qi )ier) 
2: Qi € Vi ^ (Y-i => bi) VieP 
Wi < COMPUTEWIN(G, qi, 1) 
if vo € Niep Wi then 
return ((;)iep 
V — Vi ^ COMPUTEUCA(G, Y-i ^ di, i) Vi eP 
if Y; =; for all i € P then 
return False 
return RECURSIVEGE(G, (v; Jier) 


We have the following main result of this section. 


Theorem 3. Let G be a k-player game with game graph G = (V, E,vo) and 
parity specifications (d;)iep such that (qv; Jip = OCOMPUTEGE(G), then (qj Jier 
is a GWSE for G. Moreover, OCOMPUTEGE terminates in time O(k? |E|-(2 |V |- 
2 |E|)4*2), where d is the number of priorities used in the parity specifications. 


Proof. Combining results from Thm. l| with Thm. [2] and Prop. [2] gives 
us that (gi)ie is indeed a GWSE for G. Furthermore, as w; (for all i € P) 
in each iteration of the algorithm either remains the same or add more un- 
safe/colive edges, it can only change 2|E| times. Hence, as there are k players, 
the algorithm OComPUTEGE will terminate within 2k |E] iterations. Moreover, 
each iteration involves k calls to both CoMPUTEWIN and COMPUTEUCA. Us- 
ing Zielonka's algorithnT!] [30] for solving parity games, each iteration will take 
O((2|V|-2|E|)?*?) time for d priorities (by Prop. [2). In total, this gives 
us that OCoMPUTEGE terminates in time O(K? |E|- (2]V| + 2|E])4*?). o 


Remark 4. As Anand et al. show that APA’s for games with co-Büchi specifica- 
tions (i.e., 6 = 9 OT for some T € V) are always expressible by UCA’s [I] Thm. 
3], we note that COMPUTEAPA and APPROXAPA coincide for such games. This 
implies that no over approximation of assumptions is needed in this case an the 
optimizations discussed for COMPUTEUCA and COMPUTEWIN can be directly 
applied for APA’s. 

We further note that OCOMPUTEGE also efficiently computes GWSE for 
games with more expressive specifications than co-Biichi . For instance, all games 
discussed in this paper as well as the mutual exclusion protocol discussed in [15] 
can be solved by OCoMPUTEGE. 


^ We note that the time complexity is exponential as we use Zielonka’s algorithm [30] 
to solve parity games. One can also use a quasi-polynomial algorithm [11] for solving 
parity games to get a quasi-polynomial time complexity for OCOMPUTEGE. 
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Abstract. Parametric Timed Games (PTG) are an extension of the 
model of Timed Automata. They allow for the verification and synthe- 
sis of real-time systems, reactive to their environment and depending on 
adjustable parameters. Given a PTG and a reachability objective, we 
synthesize the values of the parameters such that the game is winning 
for the controller. We adapt and implement the On-The-Fly algorithm 
for parameter synthesis for PTG. Several pruning heuristics are intro- 
duced, to improve termination and speed of the algorithm. We evaluate 
the feasibility of parameter synthesis for PTG on two large case stud- 
ies. Finally, we investigate the correctness guarantee of the algorithm: 
though the problem is undecidable, our semi-algorithm produces all cor- 
rect parameter valuations “in the limit”. 


1 Introduction 


The seminal model of Timed Automata (TA) [I] equips finite automata with 
real-valued clocks, to verify real-time reactive systems. Numerous extensions of 
TA have been proposed. Timed Games (TG) [18] distinguish controllable and 
uncontrollable actions, to study the interaction of a controller with its envi- 
ronment (e.g. the plant, an attacker, or a system-under-test). Here, we focus 
on reachability objectives, which require a strategy for the controller to sched- 
ule controllable actions such that — no matter which and when uncontrollable 
actions are executed by the environment — a desirable state is reached. 

Since precise timing constraints are not always known, one might replace con- 
crete values by symbolic parameters, to study a whole family of timed systems. 
This leads to the model of Parametric Timed Automata (PTA) [2]. The problem 
is to find (some or all) values for the parameters such that the system satisfies a 
desired property. Most problems on PTA are undecidable [3], in particular the 
reachability problem. Several decidable fragments are known, e.g. by restricting 
the number of clocks or the positions of the parameters, as in L/U PTA [14]. 
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This paper tackles the parameter synthesis problem for Parametric Timed 
Games (PTG) [I5] with reachability objectives. We provide the first implemen- 
tation of a semi-algorithm for PTG parameter synthesis. It operates on-the-fly, 
i.e. it starts solving the game while the symbolic state space is being gener- 
ated. To avoid the generation of the full, potentially infinite, state space, we also 
implement several state space reductions. These improve the termination and 
efficiency of parameter synthesis. In particular, we lift inclusion/subsumption 
from TA to PTG, generalize coverage pruning and losing state propagation from 
TG to PTG, and we port cumulative pruning from PTA to PTG. 

Interestingly, unlike the situation in PTA [5] and TG [I0], the algorithm for 
PTG is not guaranteed to terminate, even if the symbolic state space is finite. 
But we claim that if the algorithm terminates, it produces the precise constraints 
under which there exists a winning strategy. If the algorithm does not terminate, 
the stronger guarantee holds, that (in the limit) it produces all valid parameter 
valuations, provided the waiting list is handled fairly. 

The implementation allows us to study the feasibility of parameter synthesis 
for larger case studies in PTG. In particular, we synthesize parameters for the 
correctness of a game version of the Bounded Retransmission Protocol anda 
parametric version of the Production Cell [19110]. We measure the effectiveness 
of the individual pruning heuristics on these case studies. It appears that the 
state space reduction techniques are essential for feasible parameter synthesis. 


Related Work. For TG, Maler et al. [18] proposed a strategy synthesis algo- 
rithm based on classical reachability games, handling the uncountable set of 
clock values using symbolic regions. Cassez et al. [10] improved the efficiency of 
TG strategy synthesis by an on-the-fly algorithm, and working with symbolic 
zones, represented by DBMs as implemented in UPPAAL Tiga [8]. Previous work 
on PTG initially focused on decidable subcases, like the case for bounded inte- 
gers [16] and the fragment of L/U PTG [507]. The latter two papers also provide 
semi-algorithms for general PTG, either based on backward fixed points [17], or 
an on-the-fly algorithm [15], directly extending the work on Timed Games [I0]. 
That paper leaves an implementation of the algorithm (and hence an evaluation 
on larger case studies) as future work. Our implementation extends the infras- 
tructure of IMITATOR [4], which so far could only handle PTA. The symbolic 
data structure is based on Parma's convex Polyhedra Library [7]. 


Contributions. (1) We provide the first implementation of a parameter synthe- 
sis algorithm for P'TG (Sec. 4), and integrate this on-the-fly algorithm in the 
IMITATOR toolset H] (Sec. [6]. 

(2) We devise and implement several pruning heuristics to speed up param- 
eter synthesis (Sec. Bh. 

(3) We evaluate the feasibility of parameter synthesis for PTG on two large 
case studies, and measure the effect of the various pruning techniques (Sec. |6). 

(4) We carefully introduce the model (Sec. |2) and solution principles (Sec. 3), 
pointing out several semantic subtleties, and find that the semi-algorithm yields 
all valid parameters in the limit (Sec. 4). 
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2 Model of Parametric Timed Games 


A Parametric Timed Game (PTG) is a structure based on timed automata (TA). 
Similarly to classical automata, it is composed of locations connected by discrete 
transitions. Moreover, it is equipped with clocks. Locations are associated a 
condition on clock valuations (invariant) that must be satisfied while staying 
in the location. An action in a timed automaton is either to take a discrete 
transition or to let some time pass. Discrete transitions have a guard that must be 
satisfied in order to take the transition. In a parametric setting, these conditions 
use linear terms over clocks and parameters. Parameters hold an unspecified 
value, and remain constant during a run. A discrete transition also has a subset 
of clocks which are reset when the transition is taken. 

In a two-player timed game, discrete transitions are partitioned between con- 
trollable transitions and uncontrollable environment transitions. 


Definition 1 (PTG). A Parametric Timed Game is a tuple of the form G = 
(L, X, P, Act, Te, Tu, lo, Inv) such that 
— L, X, P, Act are sets of locations, clocks, parameters, transition labels. 
— T = Te UT, is the set of transitions in L x G(X, P) x Act x P(X) x L, 
partitioned into sets Te of controllable and T, of uncontrollable transitions 
of the form (0, g, 2, Y, C); l, are source and target locations; g € G(X, P) 
is the guard (see Def.|4)); a is the label; Y is the set of clocks to reset. 
— lo is the initial location. 
— Inv : L — G(X, P) associates an invariant with each location. 


Example 1. Fig. [1] shows the example of a coffee machine. The controller repre- 
sents the coffee machine and the environment represents the user. Uncontrollable 
transitions are depicted as dashed arcs. From idle, the user can ask coffee. It re- 
sets clock y that will measure the time since the demand. The machine is then 
preparing.coffee. Action serve_coffee can happen after p; (parameter featuring 
the time to pour the coffee) and no later than p, after the request. While the 
coffee is being prepared, the user may add. sugar. Adding sugar does not inter- 
rupt the pouring of the coffee and lasts p2. The coffee cannot be served while 


P 
. "a 
ide ( F----------- 
ask-coffec P3<y<p4 
y: ffee 


coffee_served 


serve_co 


y<P1 | Econ 
ask.sugar | 
x0 | 
\ 
NX 


sugar_added 


adding.sugar 


Fig. 1. Parametric Timed Game of the coffee machine. 
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sugar is being added. A situation that may arise is that sugar is being added to 
the coffee when the time limit p4 is met, making it impossible for the coffee to 
be served on time. To avoid this issue, ask sugar is disabled after waiting pı. 
Our goal is to synthesize the constraints on parameters p; to p4 for the coffee 
to be timely served. Hence, the initial location is set to preparing.coffee, with 
both clocks at 0. One possible solution to the problem is p; +p. < p4 Apa € py. 


2.1 Semantics of Parametric Timed Games 
A state of a PTG consists of a location and a valuation of clocks and parameters. 


Definition 2 (valuations). A clock valuation is a function vx € RŠ} assign- 
ing a positive real value to each clock. A parameter valuation vp € Ur assigns 
a positive rational value to each parameter. A valuation of the game G is a pair 
v = (vx, vp). The set of all valuations of the game is denoted V = R3, x Qu 


A guard is a constraint that can be satisfied by some valuations of the game. 


Definition 3 (linear terms). A linear term over P is a term defined by the 
following grammar: plt :— k | kp | plt + plt where k € Q and p € P. 


Definition 4 (guards). The set of guards G(X, P) is the set of formulas 
defined inductively by the following grammar: 


$ :— T|6^ó|x-^ plt | plt' ~ plt, 
where x € X, ~ € {<3 €; =; 2; ») and plt, plt' are linear terms over P. 
We now introduce the notion of zone which will be used to solve a PTG. 


Definition 5 (zones). The set of parametric zones Z(X,P) is the set of 
formulas defined inductively by the following grammar: 


$ := T|6^ó]|x»^ plt| x —y e plt | plt ~ plt , 
where x,y € X, ~ € («; €; =; 2; ») and plt and pit’ are linear terms over P. 


Function vp is naturally extended to linear terms on parameters, by replacing 
each parameter in the term with its valuation. With v — $, we denote that 
valuation v = (vx,vp) satisfies a guard or a zone ¢, which is defined in the 
expected manner. Zones, guards and invariants can also be seen as a convex 
set in the space of valuations of the game by considering those valuations that 
satisfy the condition. 

Transitions modify clock valuations by letting time pass or resetting clocks. 


Definition 6 (time delays). Let v — (vx,vp) be a valuation of the game and 
ô > 0 a delay. 

— Vx € X : (vx + ô) (x) = vx (x) +6 

—vto= (vx + 6,up) 
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Definition 7 (clock resets). Let v = (vx,up) be a valuation of the game and 
Y € X. vx[Y := pl is the valuation obtained by resetting the clocks in Y , i.e.: 

— Vx E Y :vx|Y := 0|(x) = 0 and Vx € X \ Y :vx|Y := 0](x) = vx (x) 
— v|Y = 0] = (v T = 0], vp) 


We can now define the semantics of a Parametric Timed Game. 


Definition 8 (state). A state of a PTG is a pair (/,v) where ¢ is a location 
and v a valuation of the game satisfying its invariant: v = Inv(£). The state 
space is then S = ((/,v) € L x V |v E: Inv(0)) = U {4} x Inv(Z) . 

LEL 


From a state in this state space, timed and discrete transitions can happen. 


Definition 9 (timed and discrete transitions). Let ó € R>o be a time delay. 
A timed transition is a relation 2? € S x S s.t. V(0,v), (/,v') € S : (¢,v) =° 
(, v) iff ( — l and v' 2 v 6. 
Let t = (0,g,8, Y, ^) € T be a transition. A discrete transition is a relation 
>t ESxS s.t. V(0v), (V, v) € S: (Gu) S* (,v) iffv Eg and v' = v[Y := 0]. 
Let 0 be the clock valuation where all clocks have value 0. The set of possible 
initial states of the PTG is £y = ((/o, (0,vp)) | vr € QE, : (0, vp) H Inv(/o)]. 


Definition 10 (run). A run of the PTG G is a finite or infinite sequence of 
states 598182... S.l. So € & and Vi € N, sq, —? soi41 >É S2i42. R(G)denotes 
the set of runs, and R(G)(s) the set of those starting from state s. 


A run alternates between (potentially null) delays and discrete transitions, 
avoiding runs that let only time pass. However, there might still be Zeno runs 
where infinitely many discrete transitions are taken in a finite amount of time. 
When there is no ambiguity, we omit G in the notations. 


Example 2. Let us consider again the coffee machine in Fig. |1| Assume the pa- 
rameter valuations are: vp(p;) = 5, up(p2) = 2, vp(p3) = 5 and vp(p4) = 6. Let 
vx = (vx(x),ux(y)). We get the sequence: (preparing coffee, ((0,0), vp)) ^ 
(preparing coffee, ((4, 4), vp)) —* "2?" (adding sugar, ((0, 4), vp)) >? 
(adding. sugar, ((2, 6), vp)) —"527-2dde (preparing coffee, ((2,6), vp)) . 


Definition 11 (history). A history is a finite prefix of a run. The set of his- 
tories of game G is denoted H(G), and those starting in state s by H(G)(s). 


'The notion of coverage allows for capturing all states that can occur up to 
some time, without a discrete transition. 


Definition 12 (coverage). Let s,s’ € S and 6 > 0 such that s —? s'. The 
coverage of. the timed transition is the set of meen date states traversed: 
Cover(s =>? s') = {s E S| 36 :0< 0 <6 ^ s? sl}. 

The coverage of state s is the set of states obtained from s with timed tran- 
sitions only: Cover(s) = (s eS|302 0s >? s/t. 

The coverage of a run r = 8098182... is the union of the coverage of its 
timed transitions. When finite, it includes the coverage of its last state ls(r) : 
Cover(r) — (U Cover (szi —? 2) U Cover (Is(r)) . 

icN 
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Definition 13 (reachability objective and winning runs). Let RC L be a 
reachability objective. The set of winning runs Qreacn(R) is the subset of runs 
that visit R: Qreacn(R) = {r € R| ACE R,dv € V :(£,v) € Cover(r)} . 


Example 3. In the coffee machine, the objective is to reach from the initial loca- 
tion prepare.coffee the location coffee. served. The reachability objective is thus 
R = {coffee_served}, and the set of winning runs is Qreacn({coffee_served}) . 


2.2 Strategies in Parametric Timed Games 


We introduce a definition of a strategy that deviates from [10], where at each 
moment, a player decides to either wait, or take a discrete transition. So their 
strategy returns values in TU {wait}. The problem with their strategy is that it 
is not always clear what should happen: for instance, given a delay 6, a history 
h = so —? s, and a strategy c, where c(h) = wait for 0 < ô € 1 and o(h) =t 
for 6 > 1, it is not clear when transition t happens: there is no minimal 6 > 1. 
Although this works formally, it is less clear what the allowed behaviour of the 
winning player is precisely. For that reason, in our definition of strategy, players 
must decide in advance which delay they will take. This makes the definition 
more constructive, clarifying what move the winning player will actually take 
(i.e. perform an action or decide to wait for some particular time) and in the 
end simplifies the definition of what is winning. 

Furthermore, following [10], the definition of strategy is asymmetric for con- 
troller and environment: If both wish to do a discrete transition, we provide 
priority to the environment; this corresponds to the safest situation from a soft- 
ware controller point of view. Another subtle asymmetry is that the controller 
cannot assume that the environment will take some uncontrollable transition, 
even when waiting any longer would violate the location invariant. While this 
is in line with the formal definition of strategy in TG [10], experiments with 
UPPAAL Tiga [8] reveal that in that tool, an uncontrollable discrete transition 
is actually forced when reaching the boundary of violating an invariant. 


Definition 14 (strategy). A controller strategy co. (resp. environment strat- 
egy ce) models decision-making. It is a function, depending on a history, decid- 
ing either to wait some amount of time (possibly infinite) or to take a discrete 
transition: o, : H — RXQ U Te, Ce : H > RQU Tau s.t. Vh E H anda € (o5,9.], 
1. Ifo(h)= (£,9,a, Y, V) e T 
then ls(h) = (£, v) such that v = g and v[Y := 0] = Inv (2) 
2. If (c(h) =ô € Ryo) and the transition —? is available in ls(h) 
then o(h =° s) e T 
where h — s denotes the history obtained by adding the delay ô at the end of h. 


A strategy can return a discrete transition if its guard is satisfied and the 
resulting state satisfies the destination invariant (1). 'To respect the alternation 
between timed and discrete transitions, we require that a strategy which returns 
a finite delay 6 > 0 on a history returns a discrete transition after the delay (if 
the run did not stop by violating an invariant) (2). 
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A controller strategy c, and an environment strategy ce can be combined 
into a global strategy o(,,,5.) as follows. If both players try to take a transition, 
we consider that the controller cannot guarantee his transition will be taken, thus 
the environment chooses. If one player decides on a discrete transition while the 
other decides to wait, the discrete transition is taken. If both players decide to 
wait, we wait for the smallest delay. 


Definition 15 (global strategy). Let c. be a controller strategy and oe an 
environment strategy. For all h € H, the global strategy o(¢,,5.) is defined by: 

— Celh) = t, € T, = O(6,,0.)(h) = tu 

= celh) =t ETA celh) =6 20 = Oo...) Ut) =t. 

— o-(h) = ô > 0A elh) =F 2 0 = Tio. o, (h) = min(ð, ô’) 


Example 4. Let us look at possible strategies in location preparing coffee of the 
running example. The machine can choose serve. coffee while the user can select 
ask sugar. If both want to do an action, the strategy chooses ask sugar, thus 
giving priority to the user. If only one of them wants to take an action and the 
other waits, the action is taken. Hence, the machine can do serve. coffee if the 
user is waiting. This is the expected behavior of a coffee machine and its user. 


'The global strategy induces a unique run, introducing null delays between 
two discrete transitions to guarantee the alternation with timed transitions. 


Definition 16 (run induced by a global strategy). Let an initial state so 
and a global strategy o be given. The run induced by strategy c is the unique 
To = 898182... obtained by: 
— If i is even, the next transition is a timed transition : 
e If c((so,...,5))) ^ t€ T a delay 0 is added: s; 3° sj, —* Si+2 
e lf c((so,...,5;)) returns a delay 6 > 0 and there is a unique state s such 
that s; 3° s (invariant not violated), then s;44 = s. 
e Otherwise, the invariant is violated and the run ends. 
— If i is odd, the next transition is a discrete transition. By the properties of 
a strategy a ((So,...,5;)) returns a transition t such that there is a unique 
state s where s; —! s. Then, sj41 = s. 


Definition 17 (winning strategy). A controller strategy o. is said to be win- 
ning from a state s € S w.r.t. a reachability objective R if and only if all runs 
starting in s and adhering to oc are winning w.r.t. the objective. State s is said 
to be winning if there exists a winning strategy from it. 

Run r is adhering to a controller strategy o< if there exists an environment 
strategy Ce, such that r = Tots, ,. 


'The question we now aim to answer is: Given a Parametric Timed Game G 
and a Reachability Objective R, is there a winning controller strategy from the 
initial state? The question depends on the value of the parameters. So, more 
precisely, we are interested in the question: For which parameter valuations is 
the corresponding initial state winning? 


OTF Algorithm for Reachability in PTG 201 
3 Solving the Game 


In this section, we introduce necessary elements for solving a game. We first 
describe the symbolic state space on which the algorithm operates. Then we 
characterize the set of winning states as a nested fixed point. 


3.1 Parametric Zone Graph 


Since clock valuations assign real numbers, the timed transition system of a 
PTG has an uncountable number of states. Zones (cf. Def. |5) are a practical 
tool to regroup these states in more manageable sets. Recall that zones (like 
guards and invariants) are conjunctions of simple constraints on valuations, and 
can be viewed as sets of valuations. Our algorithms operate on symbolic states 
€ = (£, Z), which consist of a location and a zone. We require that Z C Inv(/). 
For instance, the set of initial states of a PTG £o (cf. Def. |10) can be described 
by the symbolic state (lo, Inv(lo) ^ Ac x x = 0). 

In the notation, we identify a symbolic state (/, Z) with its semantics as the 
set of concrete states: [(L,v) | v Æ Z} C S. We will write £.£ to denote the 
(common) location of a symbolic state £. Zones are closed under the following 
operations, which we extend to symbolic states: 


— Intersection between sets 

— Temporal successors: £^ = (s' € S | ds € £,s =>? s'] 

— Temporal predecessors: €“ = {s’ € S | ds € €,s’ —? s] 

— Discrete successors: Succ(t, €) = (s' € S | ds € €, s 5 s’} 
Discrete predecessors: Pred(t,€) = (s' € S | ds € £, s! 5! s] 

— Projection onto parameters: E} p = (vp | wx, £, (G(vx,vp)) € £) 


'These operations can be implemented by standard operations on convex polyhe- 
dra [7]. We also use union, set complement and set difference, which can return 
non-convex shapes. These are represented as unions of zones, still denoting sets 
of concrete states. All previous operations are extended to unions of zones. 
Our algorithms operate on the Parametric Zone Graph (PZG). The PZG ofa 
PTG is not guaranteed to be finite, so our algorithms are in fact semi-algorithms. 


Definition 18 (Parametric Zone Graph). Given a PTA of the form G = 
(L, X, P, Act, Te, Tu, lo, Inv), its Parametric Zone Graph is defined as the tuple 
(E,&f , 3%, 9), where E C 25; VE, E € E we have € >t €' if € = Succ(t,£)^ 
and t € Te; and € >, €' if € = Succ(t,£)" and t € Ty. 


3.2 Alternating Fixed Point Property 


The algorithm works by alternating between exploring new states and back- 
propagating winning-state information from discovered winning states, starting 
from target states. The exploration relies on a fixed point property of the set 
Reach(£o), defined as all symbolic states in some run from an initial state in £o: 
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Lemma 1 (from [15]). Reach(£o) is the smallest set S containing te such 
that Vt € T, Succ(t, S)" C S. 


Similarly, the set of winning states W(R) of the PTG with reachability ob- 
jective R can be computed as a fixed point. Intuitively, we can win the game if 
we can take a temporal transition (without being diverted by an uncontrollable 
action leading us to a non-winning state) to a state that is either directly win- 
ning, or has a controllable transition to a winning state. We formalize this with 
three operators on sets of states. Let W be the set of winning states of the game. 

We call WinningMoves(S) = (s € S | 3t € Ti, 8’ € S, s > s') the set of 
states that have access to a controllable action leading to S. When applied to W, 
it gives us the states with a controllable action to reach W, from which we have 
a winning strategy. WinningMoves(S) is increasing in S. It can be computed 
using the previous operators as WinningMoves(S) = U, er, Pred(te, S) . 

We call Uncontrollable(S) = (s € S | 3t € Ty, s’ € S, s —* s') the set of 
states where an uncontrollable action leads to a state outside S. When applied to 
W, it gives us the states where the environment can derail us into a state outside 
W, from which we have no winning strategy. Uncontrollable(S) is decreasing 
in S. It can be computed from the operators from the previous subsection by 
Uncontrollable(S) = U, er, Pred(tu,S\S) . 

Finally, we call SafePred (S1, S3) the set of states that can reach S4 by a tem- 
poral transition while avoiding $5. Since it aims to be applied to reach winning 
moves while avoiding uncontrollable actions, if a state is in the intersection of 
Sı and S5, priority is given to the environment and the state is not considered 
safe. SafePred(S1,S2) = (s € S | ds’ € 81, 5 =>? s' ^ Cover(s 5? s') N Sp = O}. 
SafePred (51, S2) is increasing in Sı and decreasing in S2. 

Thanks to the work of Cassez et al. [IU], SafePred can be computed between 
zones using the precedent operations and extended to union of zones: 


Lemma 2 (from [IU] for TG, [17] for PTG). 
SafePred($1, S2) = (Sf NS£/) U (($1 n (S£) NS) 


SafePred(| JS, J82;) = J (8% n ( ]SafePred(Si), S25) 
"dd 7 


i j 


We can now formulate the fixed point property followed by W. 


Lemma 3. W(R) is the smallest set S containing R such that 
SafePred(S U WinningMoves(S), Uncontrollable(S)) C S . 


Proof. See App. B] 


4 Algorithm and Correctness 


We can now introduce the algorithm for parameter synthesis for PTG. Alg. 
explores the state space and creates a map of symbolic states connected by 
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Algorithm 1 For PTG G = (L, X, P, Act, Te, Tu, lo, Inv) and reachability ob- 
jective R, returns the set of all parameter valuations that win the game. 


1: Explored, Waiting Update, WaitingExplore + 0,0, (£7 } > Symbolic state sets 
2: Win := {} > Map from symbolic states to unions of zones 
3: Depends := {} > Map from symbolic states to sets of symbolic states 
4: WinningParam :— False 


: function SOLVEPTG 
while ^TERMINATE() do 

Choose either EXPLORE() or UPDATE() 
return WinningParam 


9: procedure EXPLORE 
10: E + extract( WaitingExplore) 


11: for t transition from £ : do 

12: E :— Succ(t,£)^ 

13: Depends[£'] + Depends[£'] U {E} 

14: if ¿' not in Explored then 

15: WaitingExplore  WaitingExplore U {é'} 
16: if £L c R then 

17: Win[£] — € 

18: Waiting Update «— WaitingUpdate U Depends|£] 


19: Waiting Update «— WaitingUpdate U {E} 
20: Explored «— Explored U {£} 


21: procedure UPDATE 
22: € + extract( Waiting Update) 
23: Uncontrollable |J Pred(t, £ \ Win[£']) 
{EESE} 
24: WinningMoves + J Pred(t, Win(£']) 
{Et EEE} 
25: NewWin := SafePred(W in(£] U WinningMoves, Uncontrollable) N £ 
26: if NewWin Z Win[£] then 
27: Waiting Update «— Waiting Update U Depends|€] 
28: Win|£] — Win[£] U NewWin 
29: WinningParam + (Win|£] N £o). 


30: function TERMINATE 
31: return WaitingExplore = (0 ^ WaitingUpdate = 0 
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a discrete transition through the operation Succ(t,, )" . Simultaneously, any 
newly found winning states in a symbolic state £, starting from the target lo- 
cations, are propagated by marking the predecessors of € for an update. To 
update a symbolic state £, we compute SafePred( Win U WinningMoves( Win), 
Uncontrollable( Win)) within € and add the result to Win[£]. If new winning 
states are found, we mark £ predecessors for an update. 

The algorithm is non-deterministic: it does not describe how we choose be- 
tween explore and update, and which symbolic state in the waiting lists to explore 
or update. These choices are left abstract on purpose, as optimization opportu- 
nities. A fair strategy would be to join WaitingExplore and Waiting Update in 
a single queue, whose head determines which operation to apply next. In our 
implementation, we prioritized back-propagation from Waiting Update. 


4.1 Invariants and Correctness 


Recall that the algorithm works on a zone graph. We are looking for subsets of 
winning states within symbolic states. The same state may appear in different 
symbolic states and may not have the same status in each instance. Therefore, 
the set Wremp, the winning states found by the algorithm so far, and W, the 
set of all winning states, also take into account the symbolic state considered. 
Formally, W consists of all pairs (£, s) where s is a winning state contained in 
the symbolic state £, and Wiemp = U {E} x Win([£). 
€€ Explored 

Theorem 1. These invariants hold during the execution of the algorithm: 

1. A € Explored. 

2. VE € Explored,t € T,£', if E > &', then &' € WaitingExplore U Explored 

3. VE € Explored, if €l € R, {E} x € € Wiemp- 

4. Wiemp c W. 

5. VE € Explored, we have either £ € WaitingUpdate or SafePred(Wremp U 

WinningMoves(Wiemp), Uncontrollable(Wtemp)) O ({E} x £) € Wien. 


Proof. See App. B]. 


Invariant 4 guarantees that even if the algorithm times out the winning states 
found by the algorithm are indeed winning. Furthermore, if the algorithm ter- 
minates and the waiting lists are empty, we can apply the fixed point properties 
of Reach(£9) and W, and Wiemp corresponds exactly to W over the explored 
symbolic states that cover Reach(£o). 


Theorem 2. Aig.|1| is correct (when it terminates). 

Proof. See [12] App. BJ. 

Example 5. For the coffee machine, the PZG is only finite after applying inclu- 
sion subsumption (Sec. [5]. However, even on this finite PZG, Alg. [1] does not 
terminate, but keeps reporting solutions at Line 9] In fact, it produces increas- 
ingly more general solutions, including n p» > p; (for any n > 0). If we bound 


these parameters in the initial specification, for instance p; > 1 A pı < 5, our 
algorithm synthesizes the extra constraints p; + po < P4 A ps < py, as expected. 
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Theorem 3. Provided the waiting lists are treated fairly, any explored winning 
state is discovered as winning by the algorithm eventually. 


Proof (sketch). For a classical TG, we can represent the underlying TA as a finite 
classical automaton (e.g. the region graph). On this automaton, we can define a 
(finite) turn-based reachability game equivalent to the initial TG. Hence, we can 
use the notion of discrete distance to target in a reachability game, corresponding 
to the smallest number of discrete transitions in which a controller can ensure 
to reach a target. This is equivalent to solving a Min-Cost Reachability game as 
studied in [9] where delay transitions have weight 0 and discrete transitions have 
weight 1. The game graph is finite and the weights non-negative, so the discrete 
distance to target of a winning state is positive and finite. 

While the same construction is not necessarily finite in a PTG, any state of 
a PTG is a state (£, (up, vx)) of the TG, where all parametric linear terms in 
guards have been replaced by their valuation through vp. Therefore, this result 
extends to winning states of a PTG. 

Let s be an explored winning state of the P'TG and n its distance to target. 
We only need to explore states reachable in n discrete transitions from s. By 
invariant []] from Thm. [i] when all states reachable in k discrete transitions are 
explored, all states reachable in k + 1 discrete transitions are either already 
explored or in the exploration waiting list. Assuming fairness of the waiting 
lists, at some time they have all been explored. Therefore, at some time, all 
states reachable from s in n discrete steps have been explored. 

When all states reachable in n discrete steps have been explored, all target 
states within are discovered. Those are states with distance to target 0. For 
0 € k « n, when all winning states reachable in less than n—k discrete transitions 
from s and with a distance to target less than k are discovered, then all winning 
states reachable in less than n — (k + 1) discrete transitions from s and with a 
distance to target less than k 4- 1 are discoverable by update. Using the invariant 
(5) of Thm. |1| those states are either already discovered as winning or they are 
in the update waiting list. Assuming fairness of the waiting lists, at some time 
they have all been discovered winning. Applying this recurrence until k — n, we 
get that there is a time where s is discovered winning. 


We can guarantee: (1) All winning parameter valuations reported in Line 
are correct, since the algorithm satisfies the invariants of Thm. |1| (2) Every 
winning parameter valuation will eventually be reported, provided the waiting 
lists are treated fairly. Hence, Alg. [is “sound and complete in the limit” [5]. 


5 Optimizations 


We present four optimizations to the algorithm presented in Section |4| All of 
them adapt optimizations from previous works, three of them (coverage pruning, 
inclusion checking and losing state propagation) from Cassez et al. [10] and one 
of them (cumulative pruning) from André et al. [5]. We start by updating the 
exploration procedure to include the optimizations, as shown in Alg. 
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Algorithm 2 Adding optimizations to the explore procedure 


1: procedure EXPLORE 


2: € + extract( WaitingExplore) 
3: if Lp C WinningParam then > Cumulative Pruning 
4: return 
5: if £L € R then 
6: Win|£] — € 
T: Waiting Update «— WaitingUpdate U Depends|€] 
8: if controller deadlock(£) ^ Win|é] Z € then > Losing state propagation 
9: WaitingUpdate; < Waiting Update, U Depends|€] 
10: if Win|£] = € V controller deadlock(£) then > Coverage Pruning 
11: return 
12: for t transition from € : do 
13: E := Succ(t,£)^ 
14: if 3£" € Explored : € C £&" then > Inclusion check 
15: Depends[£"] + Depends|[£"] U (£) 
16: else 
17: Depends[£'] — Depends|é’] U {E} 
18: WaitingExplore «— WaitingExplore U (£'Y 
19: Waiting Updates, «— WaitingUpdatey, U {E} 
20: Waiting Update , «— Waiting Update, U {E} > Losing state propagation 
21: Explored «— Explored U {£} 
5.1 Pruning 


First, we present some pruning techniques, as these only require slight modifi- 
cations in the exploration procedure. To this end, we introduce the notion of a 
controller deadlock state. A state is a controller deadlock state if it has no con- 
trollable transitions. We define it as the following predicate on symbolic states: 


controller deadlock (£) = Vt,£' : if £ >* & then t € Tu 


Now, we introduce the two kinds of pruning: 


Cumulative Pruning: If the projected parameters of a zone in a new 
symbolic state are included in the current set of winning parameters, we 
can safely prune the successors of this state. Indeed, if the only possible 
parameters in the zone already are determined to be winning, no new winning 
parameter can be found by exploring the successors of this state. This check 
can be seen in Lines[3] and [4]of Alg. 

Coverage Pruning: If a symbolic state is either winning or a controllable 
deadlock state, its successors can safely be pruned. Indeed, if the symbolic 
state is winning, we gain nothing from exploring further. Dually, a controller 
deadlock state can never become winning, since the controller has no action 
to do. This check can be seen in Lines [10] and [11]of Alg. 
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5.2 Inclusion checking 


Originally, checking if a symbolic state £' has been explored already is done by 
checking if €’ € Explored. The optimization by inclusion checking instead checks 
if d£" € Explored : € C £&". If this is the case, the newly discovered symbolic 
state can safely be discarded since its superset has already been explored. Of 
course, the new dependency that £ depends on £" still must be added. This 
optimization is done in the exploration procedure (Lines |14|to [18] of Alg. b). 


5.3 Losing state propagation 


Losing state propagation is inspired by Cassez et al. [10] for TG. The idea is that 
instead of only discovering and propagating winning states, we will now also do 
the same for losing states, starting from controller deadlock states. A map Lose 
will maintain the currently known losing states for a given symbolic state. Thus, 
each symbolic state £ can now be partitioned into three: 


— Winning: Win/é], 
— Losing: Lose[€] 
— Unknown: £ \ (Win|£] U Lose|[£]). 


To initially mark a state as losing, we use the controller deadlock predicate again 
while also making sure that the state is not winning, as shown in Alg. Lines [5] 
and [9] On Lines |19| and we partition the WaitingUpdate list into two lists 
for propagating winning and losing states respectively. 

While pruning and inclusion checking only required the modification of the 
exploration procedure, the propagation of losing states influences all of the pro- 
cedures of the original algorithm. We go through them now. 


Update procedure. We create a new procedure for updating losing states, which 
can be seen in Alg.|3} As the dual of the original update procedure, it is almost 
identical. Instead of Uncontrollable, we compute Controllable, i.e. the union of 
zones where the controller can lead to a non-losing state. Similarly, instead of 
WinningMoves, we compute LosingMoves which is the set of states where the 
environment can lead to a losing state. We then compute NewLosing which 
is the set of states where the environment can lead us to a losing state while 
avoiding states where only controllable transitions are enabled (Controllable \ 
LosingMoves). Finally, we update Lose|£] and Waiting Update;, accordingly. 


Terminate function. The terminate function is modified to allow for early ter- 
mination if all possible information is already known, i.e. te \ (Win[£g. ] U 
Lose[&z. ]) = 0. Indeed, if for all valuations we have determined that we either 
win or lose, the algorithm can safely terminate. This is shown in Alg. 

The final algorithm is then modified to include the new procedures and data 
structures introduced. As a result, in the main loop we now have to choose 
between three waiting lists instead of two: WaitingExplore, Waiting Update y, 
and Waiting Update ; . 
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Algorithm 3 Adding new update procedure for losing state propagation 


procedure UPDATE_L 
€ + extract( Waiting Update ; ) 
Controllable + J Pred(t,€’ \ Lose[£']) 
(6,06 i6") 
LosingMoves «— | |)  Pred(t, Lose|[£']) 
{EESE E} 
NewLosing :— SafePred(Lose|£] U LosingMoves, Controllable \ LosingMoves) N € 
if Lose[£] € NewLosing then 
WaitingUpdate, «— WaitingUpdate, U Depends|£]; Lose[£] + NewLosing 


Algorithm 4 New TERMINATE with early termination if initial zone is covered 
function TERMINATE 
isEmpty + WaitingExplore = 0 ^ WaitingUpdate = 0 
initial ZoneCovered + (£7) C (Win[£4 ] U Lose[£4 ]) 
return isEmpty V initialZoneCovered 


6 Implementation and Experimental Evaluation 


To evaluate the termination behavior and efficiency of the semi-algorithm and 
the optimizations, we implement them in the IMITATOR toolset and measure the 
performance on some realistic case studies. 


6.1 Implementation 


We have implemented our proposed algorithm and optimizations in the IMITA- 
TOR model checker [4], which features a wide repertoire of synthesis algorithms 
for PTA. We have extended its input language to PTG and added our PTG 
parameter synthesis algorithm, including the optimizations described in Sec. 
The source code (in OCaml) is available on githutf] 

In IMITATOR, the user specifies a model consisting of parameters, clocks and 
a network of parametric timed automata. The user can analyse the model using 
an analysis or synthesis query. IMITATOR selects the corresponding algorithm to 
use, after which it outputs the result of the query. 

Our extension enables the user to specify edges in a PTA as (un)controllable, 
effectively turning it into a PTG. Along with this we add a new property Win and 
a corresponding algorithm AlgoPTG. In order to synthesize parameters for a PTG 
one must use property := #synth Win(state predicate), using a predicate 
to define which states are winning. Usually, this predicate is simply accepting, 
meaning that any state in an accepting location of the P'TG is winning. 

In Alg. [1] we left the choice between exploration and back-propagation to be 
non-deterministic. In the implementation back-propagation is prioritized over 
exploration whenever possible (i.e. when WaitingUpdate is non-empty). This 
seems to yield the fastest results in practice. 


3 https: //github.com/imitator-model-checker/imitator, branch: develop 
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6.2 Experiment Design 


We selected two large case studies, one PTA and one TG, and extended them to 
PTGs by adding (un)controllable actions, and clock parameters, respectively. An 
artifact containing instructions to run all the experiments is available online [T1]. 


Production Cell. This case study has two conveyor belts (1 / 2), a robot 
with two arms (A / B) and a press. Plates arrive at conveyor belt 1 and are 
taken to the press by robot arm A, where they are processed for some time. 
Robot arm B takes processed plates and removes them through conveyor belt 2. 

We model systems with 1—4 plates in IMITATOR. In the goal location, every 
plate made it safely to conveyor belt 2. If two plates collide before they are 
picked up by arm A, the game is lost immediately. We assume that the rotation 
speed of the robot arm, the speed of the conveyor belt and the time to press are 
known constants. The aim is to synthesize a parameter MINWAIT, the minimum 
time interval between two plates arriving at the conveyor belt. The maximum 
time interval between two plates is fixed by an additional constant MAXWAIT. 

Our PTG model is largely inspired by the TG model of Cassez et al. [IO]. 
Besides adding parameters, we check for collisions between plates rather than 
defining a maximum waiting time frame. For 2-4 plates, we create a winning 
and a losing configuration of the constants; for 1 plate a collision is not possible. 
The losing configurations are created by setting MAXWAIT too small, which will 
deadlock the system for any value of MINWAIT. 

The IMITATOR model for the 1-plate configuration can be seen in [I2] App. 
C]. 


Bounded Retransmission Protocol. The BRP provides reliable communication 
over an unreliable channel. We create a PTG from a PTA model of the BRP [6], 
in turn based on a TA model [13], by making message loss uncontrollable. 

In the BRP, a sender sends message frames to a receiver, tagged with an 
alternating bit, through a lossy channel. The receiver acknowledges all frames. 
If the sender does not receive an acknowledgement in time, it retransmits the 
message at most k times, after which the sender gives up. The goal location 
indicates the successful transmission of the message, or the abort by the sender. 


Experimental Setup. All experiments were run on a single core of a computer 
with an Intel Core i5-10400F CPU @ 2.90GHz with 16GB of RAM running 
Ubuntu 20.04.6 LTS. For each implementation (basic, inclusion checking, cumu- 
lative pruning, coverage pruning, losing state propagation) we run the experi- 
ments 5 times and report the average time and state space size. A timeout of 2 
hours is used. 


6.3 Experimental Results 


We present the results of the experiments in Table [1| We do not include the 
runs without optimizations as they all timed out. This indicates that inclusion 
checking is the most vital optimization and should always be enabled. 
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Table 1. Experimental results for different optimizations: inclusion check (inc), cumu- 
lative pruning (cm), coverage pruning (cv), losing state propagation (lp). Running time 
in seconds (s) and number of symbolic states (size). Green indicates the best results. 


inc inctem  |j/inc+cm-+cv]|inc+cm+cv-+lp 
Time} Size | Time! Size ||Time| Size | Time| Size 


Production Cell 


Win||0.06s| 86 ||0.06s| 86 /0.06s| 86 ||0.08s 86 
Win||7.19s| 746 ||7.56s| 746 /|6.60s| 701 | 7.22s 701 
Lose| 1.43s| 439 ||1.44s| 439 ||2.03s| 517 ||2.17s 517 
Win || 36.7s | 1900 ||37.3s| 1900 |} 24.0s | 1539 || 34.2s| 1539 
Lose|| 13.4s | 1372 || 13.9s | 1372 || 9.53s | 1251 || 14.2s 1251 
Win||4903s|10755||4750s|10755//2394s| 9350 ||3522s| 9350 
Lose|| 34.8s | 2605 || 35.6s| 2605 || 21.6s| 2372 || 153s | 2372 
Bounded Retransmission Protocol 
34.3s | 1042 || 32.2s | 1042 || 7.1s | 612 || 7.5s 612 


A plates 


Indicated in green cells are the best results for each row. We can clearly 
see that coverage pruning has the biggest effect of all the optimizations in our 
experiments. Losing state propagation seems to not provide much benefit in these 
experiments, as the overhead overshadows any positive effect it might have had. 


7 Conclusion 


We provide the first implementation of parameter synthesis for Parametric Timed 
Games with reachability objectives, based on an on-the-fly algorithm [I0f16]. It 
appears that without additional pruning heuristics, the algorithm cannot han- 
dle the case studies, Bounded Retransmission Protocol and Production Cell. 
Inclusion subsumption is a minimal requirement to achieve any result. 

Contrary to previous algorithms for PTA [5] and TG [10], the parameter 
synthesis algorithm does not terminate, even if the parametric zone graph is 
finite. But we found that in the limit all parameter values will be enumerated. 

We added additional pruning techniques (coverage pruning and cumulative 
pruning) to further reduce the search space. These techniques generally increased 
the speed. We also experimented with propagating losing states, but in our exam- 
ples the overhead of checking and propagating losing states was not compensated 
by any pruning potential. Future work could study under which circumstances 
the propagation of losing states could be beneficial, but also strengthen the de- 
tection of (partially) losing states. Another venue for future work is to study 
other objectives, like safety games or liveness conditions. 


Acknowledgment. We thank Étienne André for his help with integrating our 
algorithm in the IMITATOR tool set. 
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Abstract. We provide an algorithm to solve Rabin and Streett games over graphs 
with n vertices, m edges, and k colours that runs in Ó (mca!) time and 
O(nklog klog n) space, where O hides poly-logarithmic factors. Our algorithm 
is an improvement by a super quadratic dependence on k! from the currently 
best known run time of O ma tp o obtained by converting a Rabin 
game into a parity game, while simultaneously improving its exponential space 
requirement. 

Our main technical ingredient is a characterisation of progress measures for 
Rabin games using colourful trees and a combinatorial construction of succinctly- 
represented, universal colourful trees. Colourful universal trees are generali- 
sations of universal trees used by Jurdziriski and Lazić (2017) to solve parity 
games, as well as of Rabin progress measures of Klarlund and Kozen (1991). 
Our algorithm for Rabin games is a progress measure lifting algorithm where 
the lifting is performed on succinct, colourful, universal trees. 


Keywords: Rabin games: Parity games- Colourful trees 


1 Introduction 


A Rabin game is a two-player infinite-duration game played on a directed, coloured 
graph, where each vertex has a finite set of good colours and a finite set of bad colours 
associated with it [29]. The two players Controller and Environment take turns to 
move a token along an edge to form a play, an infinite path in the graph. Such a play 
is winning for Controller if there is a colour that is a good colour for some vertex seen 
infinitely often along the path and is not a bad colour for any vertex seen infinitely 
often. Rabin games lie at the core of reactive synthesis for omega-regular specifica- 
tions and efficient algorithms for Rabin games are of practical interest in synthesis 
tools. 
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Rabin automata already appear in McNaughton’s solution of Church’s synthesis 
problem and in Rabin’s proof of the decidability of SnS [29], where it was first 
defined in the setting of infinite trees. To solve Church's synthesis problem for w- 
regular specifications, represented by non-deterministic Büchi automata, there are 
two well-studied (polynomial-time equivalent) approaches: either reduce it to the 
emptiness problem for Rabin tree automata or solve a Rabin game. 

Rabin conditions are also known to be suitable specifications for general fair- 
ness constraints [15]. Klarlund and Kozen defined Rabin measures over graphs 
and applied them to prove program termination under a general fairness constraint. 
Indeed, the acceptance condition that defines strong fairness, i.e., if a given set of ac- 
tions (edges) is enabled infinitely often (the source vertex is seen infinitely often), it 
is taken infinitely often, is naturally expressed by the complement of the Rabin con- 
dition, called the Streett condition : 

Algorithmically, the problem of solving Rabin games was shown to be NP-comp- 
lete by Emerson and Jutla in the late 1980s. In the same paper, Emerson and 
Jutla, and independently, Pnueli and Rosner 28], gave an algorithm that takes time 
O ((n k)3*) time, where n is the number of vertices of the game graph and k the num- 
ber of colours. 

Steady progress was made to solve Rabin games, and within a decade, Kupferman 
and Vardi reduced the cubic dependence on n* to a quadratic one by giving 
an algorithm to check non-emptiness in a Rabin tree automata in time O (mn?* kl). 
Later, Horn gave a different solution to solve Streett games—and therefore Rabin 
games—with the same running time. 

A lot of progress was simultaneously made on parity games [12], a special case of 
Rabin games where colours are assigned to each subset of states in a chain of sub- 
sets. Inspired by fixpoint evaluation algorithms and the small progress measure 
algorithm of Jurdzinski for parity games, Piterman and Pnueli gave a fast 
O (mn* kk!}-time, O(nk)-space, algorithm for Rabin games. This algorithm used a 
concept of a measure to solve Rabin games. 

The work of Piterman and Pnueli remained state-of-the-art for Rabin games until 
the quasi-polynomial breakthrough for parity games by Calude, Jain, Khoussainov, 
Li, and Stephan [1]. They gave a fixed parameter tractable algorithm (FPT) for Rabin 
games on k colours by converting it to a parity game and using the quasi-polynomial 
algorithm. 

A Rabin game with n vertices, m edges, and k colours, can be reduced to a par- 
ity game over N = nk*k! vertices, M = nk?k!m edges, and K = 2k 1 colours [12]. 
By combining the reduction from Rabin to parity games and state-of-the-art algo- 
rithms for parity games |18/8|14|9| in a “space-efficient” manner, say of Jurdziński 
and Lazić [18], one can solve Rabin games in time O (max (MN?98, 20KlogK)]), but 
exponential space (since the parity game is exponentially bigger). 

On substitution of the values of M and N, the algorithm of Jurdzinski and Lazić 
would take time at least proportional to m(nk? - k!)538 for games with n vertices, m 
edges and k colours. However, observe that the parity game obtained from a Rabin 
game is such that the number of vertices N = nk*k! is much larger than the num- 
ber of colours K = 2k + 1. Indeed, this results in K € o (log(N)). For cases where the 
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number of vertices of the resulting parity game is much larger than the number of 
priorities, say the number of colours (2k + 1) is o (log (N))—which is the case above 
as k grows—Jurdzinski and Lazić also give an analysis of their algorithm that would 
solve Rabin games in time O (nmk?*90)), Closely matching this are the run times in 
the work of Fearnley et al. [14] who provide, among other bounds, a quasi-bi-linear 
bound of O(MN a(N)loglog N ), where a is the inverse-Ackermann function. In either 
case above, this best-known algorithm has at least a (k1)2* 90) dependence in its run 
time, and takes space proportional to (nk? k!) log(nk? k!), which has a k! dependence 
again. 


Our Contribution. Our result breaks through the 2 + o(1) barrier, while simultane- 
ously using polynomial space, to give a fixed-parameter tractable algorithm for Ra- 
bin games. We show a new algorithm for Rabin games on graphs that runs in time 
O(mn(k)!*°®) time and O(nklog klogn) space, for a game on n vertices, m edges, 
and k colours. Our algorithm improves the quadratic (k!)? dependence in the num- 
ber of colours in the best current algorithms, while simultaneously using only poly- 
nomial space. 

Our first technical contribution is a characterisation of winning states in Rabin 
games using "colourful trees," by generalizing previous work on Rabin measures on 
graphs by Klarlund and Kozen [20]. Using our characterisation, we provide an algo- 
rithm to compute winning states and strategies as a fixed point of a lifting function 
over the lattice of functions from vertices of a game to nodes of a colourful tree. 

Our second contribution is the construction of a universal colourful tree that 
embeds any colourful tree with a given number of leaves and fixed set of colours. 
Universal trees are found underlying all the quasi-polynomial algorithms for parity 
games (18]6]19]8)21). Our construction uses the theory of universal trees developed 
for parity games, especially that of Jurdzifiski and Lazić [18]. From our construction 
of universal colourful trees, we can also naturally construct an instance of universal 
graphs for Rabin objectives, where the definition of universal graph is as introduced 
by Colcombet and Fijalkow. Although constructing universal graphs directly give us 
a lifting algorithm, for the sake of completeness, we also provide a lifting algorithm 
that uses our construction of colourful universal trees. Therefore, we show how to 
construct a small universal colourful tree (our upper bound is tight up to a polyno- 
mial factor) that can be succinctly encoded and efficiently navigated. 

By applying the lifting algorithm to our succinct universal colourful tree, we get 
our time and space bounds. 

Just as Piterman and Pnueli's result generalized ranking techniques and progress 
measures for parity games, we generalize the notion of measures [20] and universal 
trees [18] central to the fastest algorithms for parity games to obtain our algorithm. 


2 Preliminaries 


We use N to denote the set of all natural numbers 10, 1,2,...). A directed graph con- 
sists of a finite set of vertices V along with a binary relation E over the set of vertices 
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called the edge set. We write u — v to denote an edge (u, v) € E. A finite (resp. infi- 
nite) path in a directed graph is a finite (resp. infinite) sequence of vertices such that 
a tuple formed by any two consecutive vertices in this sequence is an edge in E. 


(co, C) -Colourful Ordered Trees. Let C be a finite set of colours and let cp € C be a dis- 
tinguished root colour. Informally, a (co, C)-colourful ordered tree with root colour co 
is an ordered tree of height at most |C| + 1 whose root is associated with the colour 
co € C, and whose every other node has a colour from C associated to it. As an ex- 
ception, we allow some leaves to be left uncoloured, denoted by a "dummy colour" 
L € C. We also require that along any path from the root to a leaf, each node must 
have a different colour. 
Formally, for a finite set C, we recursively define (co, C)-colourful trees 


— if C = Ø, (co, Q) and (co, (CL, Q), ..., CL, ©))) are (co, 2)-colourful trees. 
— if C Z Ø, we say J is (co, C)-colourful tree if it is either 
* a (co, C')-colourful tree rooted at co for some C' C C; or 
© F = (c9, (21,...,25)), and for all i € (1,..., 7], either there is a c; € C and J; 
is a (cj, C \ {c;})-colourful ordered tree, or 7; = (L, 0). Note that these c; need 
not be different from one another. 


We define the concatenation of a (co, C1)-colourful tree 71 = (co, (ons vy )) and 
a (co, C2)-colourful tree J3 = (co, (TA, - 29:53) as the (co, C1 U C2)-colourful tree de- 
noted by 21-5 as [ap ad ua m UA For a root colour co, a number 
£ € N, and a (c, C)-colourful ordered tree J, we denote T to be the tree with £ 
many copies of J, (c9, (2 , T ,..., 7 )). When (co, C) is clear from context, we simply 
say "colourful tree." 


Embedding Colourful Trees. Given a (co, C)-colourful tree 47 and a (co, C’)-colourful 
tree J, such that C' € C, we say Y embeds 9 if T = (co, 0), or T = (co, (21,...,27)) 
and Y = (co, (Y1, ..., Ym?) for some Z, m, and there is some increasing sequence of 
indices 1 < ij < io < --- < ig < m such that Ui, embeds 7; recursively. Notice both 
Ui, and Jj must be rooted at the same colour, say Cj and both are [6,4 {c;})- 
colourful and (c pc "Vc j])-colourful trees respectively. 


Labelled Colourful Trees. In what follows, we shall additionally label colourful trees 
with labels from some linearly ordered set. It is more convenient to define such la- 
belled colourful trees as prefix-closed sets of sequences, using the isomorphism be- 
tween a (recursively defined) tree and its set of paths. 

Let L be a set of labels with a linear ordering «i € L x L. An L-labelled (co, C)- 
colourful tree is a prefix-closed set of sequences over L x (Cu {L} where L x (Cu {L} 
is the Cartesian product of L and (Cu {L}. 

Given an element To € Lx (CU(.L]) and a sequence (71,72, a "n in (L x (CU{L}))*, 
we use © to denote concatenation to the tuple, where we say To © (71,72, NE j) = 
(To,T1,T2,...T;). We extend this notation to sets of sequences 2, by also defining 
790 Z = { (T0, T1, 72... 7] (T, T5... 7) € 2). 

We say a prefix-closed set € (Lx (CU{L}))* is an L-labelling ofa (co, C)-colourful 
ordered tree J 
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- if T = (cg, (LL, Q)7)), and & is the prefix closure of the set {(@1, 1),...,(@m, L)} 
for some a, <1 @2 <L ° «p Am €l, 
- if F =(c,(N,...,Fm)) then Z is the prefix closure of the set 
(&1,€1) OL, U (a2, 02) © L2 U+- U (Am, Cm) © Lm 


for some a; 3j à» SL +° S @m inl, such that for all j, 


. Jj isa C^ {c;}-colourful tree rooted at Cj and Lj is an L-labeling of Z5, 
e cj € CULLI, and 
e whenever qj = à j,1, we have cj # Cj+1 


Note that the root colour co of J does not appear in Z; instead of tracking cp along 
with Z explicitly, we implicitly assume the root colour of the tree Z above is co. 

We refer to elements of the prefix-closed set Z of a labelled tree as nodes of 
the tree. For two nodes n and n in Z, we define the greatest common ancestor, 
written GCA (7, n2), as the longest common prefix of n; and n». We define nı to 
be an ancestor of n» if nj = GCA(n, n2). In particular, n; is a parent of n2, written 
nı = parent(75), if n; is the largest node other than nz such that ny = GCA(n, n2); 
we then say nz is a child of nı. 

The colouring of a node is defined to be the last colour occurring in the sequence: 
For the empty sequence (), we define colour(0) = co, and colour ((@1, Ci), ..., (aj, ci) = 
6i; Furthermore we define ColourSet : Z — 2C916!, which maps a node to the set of 
colours seen from the root to that node: ColourSet(n) = {colour(n’) | n! = GCA(n', n) }\ 
eee 


Ordering. We define an ordering < y on Z. First, we fix some arbitrary linear order 
on the set C and set colour 1 to be larger than all the colours in C in the ordering. 
We compare elements by extending the linear order <, over L and an arbitrary fixed 
order < over C to a linear order over the set L x (CU {L} lexicographically as follows: 
for two elements in L x (CU {L}, we declare (a,,c1) < (a2, c2) if either a, <} a2 or 
Q@,=a2 and c] < c2. 

For two nodes 7, 7? € Z, we define n; < v n» if either n; is a strict prefix of n2, or 
if nı is lexicographically smaller than nz when viewed as sequences over L x (CU(.L]). 

Due to space constraints, the missing proofs can be found in the full version of 
the paper. 


Example 1. Figure[1|depicts a (e, (^, e, e])-colourful tree, where the nodes denoted by 
o represents uncoloured nodes. A fixed ordering on the set of colours - < @ < è < o, 
a labelling of this tree over L = {1,2} € N is the prefix closure of the following set 
{(le,le,10), (15,16,26,10), (15,16,26,20), (15,106,106), (15,16,26,20), (10), (26,2^), 
(26,16,1-,10), (26,16,20)). The ordering « z, (represented by <) on some nodes is 
as follows: () < (1-) < (1^, le) < (10) < (2e, 2%). The ordering in the nodes of the tree in 
the figure decreases when we go from a child to a parent, or we go "left" in the tree, 
but otherwise increases. 
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Fig. 2: A colourful Rabin graph 4 where all 
Fig. 1: A colourful tree. infinite paths satisfy the Rabin condition 


3 Rabin measure and Colourful Decompositions 


In this section, our aim is to understand the Rabin acceptance condition on graphs. 
We define such acceptance conditions and provide a local witness called a Rabin 
measure for graphs where all paths satisfy the Rabin condition. 

A (co, C)-colourful Rabin graph € consists of (1) a directed graph (V, E), (2) a finite 
set C of colours and a special colour co € C, and (3) for each vertex v € V, aset of good 
colours G, € CU {co} for v and a set of bad colours B, € C for v. Observe that co € By 
for any v. We call each colour c in G, a good colour for v, and each colour in B, a bad 
colour for v. 

We assume every vertex has some outgoing edge in the directed graph. An infinite 
path in 4 satisfies the Rabin condition if there is some colour c in Cu {co} such that 
c is a good colour for some v seen infinitely often along the path and c is not a bad 
colour for any v seen infinitely often along the path. 


Example 2. Consider the (e, (e, e, .)-colourful Rabin game in Fig.] The colours that 
are in the good set of each vertex are represented with a smiley face in the same 
colour and those that are bad colours appear with a sad face. Although a vertex can 
have more than one colour assigned to it as a good colour (or a bad colour), we only 
consider at most one good and bad colour per vertex for this example. In our exam- 
ple, the leftmost vertex in the graph «4 in Fig. [2|has the singleton set {e} as the set of 
good colours and the set (-j as the set of bad colours. Similarly, the topmost vertex 
in Fig.[2]has the set (ej as the set of good colours and an empty set of bad colours. Ob- 
serve that in the graph €, any infinite path satisfies the Rabin condition. Indeed, for 
any infinite path there is some colour that is not a bad colour for any of the vertices 
that occur infinitely often and is a good colour for some vertex that occurs infinitely 
often. For example, if a path is such that all the vertices of 4 are visited infinitely of- 
ten, then the colour e is not a bad colour of any vertex and the same colour e is a 
good colour of the topmost vertex. 


As opposed to preexisting definition in literature of Rabin games that use Rabin pairs 
to represent the acceptance condition, we instead define two sets of colours associ- 
ated to a vertex rather than a pair of subsets of vertices associated to a colour. This 
does not add more than a constant factor in terms of representation size. 
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A Measure for Rabin Graphs. We fix a (co, C)-colourful Rabin graph with the under- 
lying graph (V, E) with good colours for a vertex v denoted by G, and the bad colours 
denoted by B,. Let L be a linearly ordered set of labels, and let Z be an L labelled 
(co, C) -coloured tree. We define S'-2uiT! by adjoining an element T to Z and 
we extend the ordering < y (denoted henceforth by <) to Z! , by declaring t < T for 
alte Z. 

Consider a map p: V — 2! . We call an edge u — v consistent with respect to u, 
if either u(u) is mapped to T or it satisfies the condition (G> OR G) AND B; for G», 
G}, and B defined below. 


(G>) p(w) > u(v) 
(G) GCA(u(D, u(v)) = u(u) and colour(u(u)) € Gu. 
(B) ColourSet(u(u)) n By = Ø 


In words, G» conveys that the measure u decreases along the edge u — v and G, says 
that the measure can increase along an edge but only into a descendent node and 
only when the colour of the node that is currently mapped to is a good colour for u. 
The condition represented by B says that none of the colours assigned to any ances- 
tor of u is a bad colour for it. 

If the map p is clear from the context, we call an edge or a vertex consistent 
without mentioning the mapping. We say the relation and function GCA(-, T) and 
colour(T) are undefined, and the condition G, or B are not satisfied when (v) is 
mapped to T and u(u) is not mapped to T. 

We say the map w is a (co, C)-colourful Rabin measure for a graph @ if all edges 
in E are consistent with respect to u. A mapping from the vertices of a Rabin graph 
to the nodes of a tree ensures that an infinite play corresponds to an infinite set of 
nodes in a tree. If a mapping is consistent, then such a mapping serves as a witness 
to the fact that an infinite path in the Rabin graph satisfies the Rabin condition. 

Our definition is a modification of Klarlund and Kozens [20] notion of Rabin mea- 
sures, following recent approaches to faster algorithms for parity games [1818]. 


Colourful Decomposition. The Rabin measure, as with other progress measures, is 
based exclusively on local properties. Indeed, in the above case, we have a progress 
measure when each edge satisfies certain conditions. Before we show that Rabin 
measures capture winning sets ofa graph, we define an intermediate structure, which 
we call colourful decompositions. These colourful decompositions of a Rabin graph 
highlight a recursive structure that captures the acceptance of all paths in a way 
which relates naturally to colourful trees. Colourful decompositions generalise at- 
tractor decompositions of parity games to Rabin games 

Consider a (co, C)-colourful Rabin graph «4. A (co, C)-colourful decomposition 2 
of 4 is a recursive sub-division of vertices V of 4 into subsets of vertices defined as 
follows. If C = 9, then we say 2 := (V) is a (co, C)-colourful decomposition if and 
only if all infinite paths from all vertices in V visit a vertex v such that co € Gy. Else, if 
Cz o andif|V| 2 1, and 


2 = (A (c1, V, A),... (Cj Vj, Dj, Aj)) 


satisfies the following conditions: 
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- Ais the set of all vertices in V such that all infinite paths starting from A in 4 
visit some vertex v € V such that cg € Gy; 
- SetW, = V \ A. Fori € 1,..., j, 
e V; is a set of vertices which has no path to W; V V; and c; € B, for all v € Vj; 
e Diis a (ci, C \ {c;})-colourful decomposition of V;. 
e A; is the set of all vertices in W; such that all infinite paths from A; within 
W; visits some vertex in V;; 
* Wip = W;\ Aj. 
E Wj+1 =ø. 


e 
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(a) A colourful decomposition of a Rabin 
graph @ where all paths satisfy the Rabin (b) A labelled colourful tree into which 
condition the graph 4 has a Rabin measure. 


Fig. 3: A colourful decomposition and tree for Rabin measure 


The crux of this section is Theorem [1| below which shows the equivalence be- 
tween Rabin measure, the existence of a colourful decomposition and a Rabin graph 
where all paths satisfy the Rabin condition. 


Theorem 1. The following three statements are equivalent for a (co, C) -colourful Ra- 
bin graph 4. 


1. All infinite paths in 4 satisfy the Rabin condition. 

2. There is a (co, C) -colourful decomposition 2 of the vertices of G. 

3. There is an L-labelled (co, C) colourful Rabin measure for , where no vertex is 
mapped to T for some linearly ordered infinite set L. 


The theorem above is proved by showing[1]—[][2]— [B] and finaity[s]— [1] 


Proof Sketch[1|— [2] If C is empty, then the decomposition is 2 = (V) for a (co, 2)- 
colourful graph where all paths satisfy the Rabin condition. If C is not empty, we first 
remove all vertices A from 4 that can visit a vertex for which co is a good colour. In the 
decomposition of the graph into Strongly Connected Components (SSCs) induced 
by V^ A, each infinite path satisfies the Rabin condition, and therefore especially the 
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infinite path which consists of all the vertices of some bottom SCC, Vj. Hence, there 
must be one colour c that is not a bad colour for any vertex and is a good colour for 
at least some of the vertices Vj. One can therefore inductively construct a (c, C V {c})- 
colourful decomposition 2; for the vertices of Vi. Later, in the graph «4 without the 
vertices of A and V; and all vertices Aj from which all paths lead to Vi, we again 
get an other graph where all infinite paths satisfy the Rabin condition. This graph, 
again by induction has a (co, C)-colourful Rabin decomposition 2’. We finally ‘glue’ 
together (A, (c, Vi, 21, A1)) and 2' obtained above. 


Proof Sketch [4 => |3| The proof follows a recursive construction of an L-labelled 
(co, C)-colourful tree where the recursion is based on the structure of the decomposi- 
tion. An example of how such a mapping to a tree is obtained from a picture is exem- 
plified in Fig.B] The decomposition 2 ofthe game 4 is (A, (e, Vi, Z1, A1), (e, V2, 22, A2)). 
Some ofthe sets of the decomposition are indicated in Fig.[3a] The measure obtained 
from the decomposition into the given tree is intuitive. For example, the measure ob- 
tained from the given decomposition of the game @ is such that the vertex for which 
the colour eis a good colour is mapped to the root of the tree. Similarly, this measure 
maps the vertex in V; for which the colour e is a good colour to the node le in the 
tree. The only vertex in Az V V? is mapped to the node 2o. 


Proof Sketch[3|— [1] If there is a Rabin measure, each edge in the infinite path satis- 
fies B, as well as G, or G}. For such an infinite path, we consider the infinite sequence 
of nodes of the colourful tree, obtained by taking the image of u on the run. In this 
sequence obtained, consider the smallest node of the tree t that is visited infinitely 
often, and let c = colour(t). We show that t is a common ancestor for all elements of 
the sequence after a finite prefix. Since all edges satisfy G» or G}, c is a colour such 
that c € G, for some v visited infinitely often. As all edges satisfy B, we have c € B, 
for all vertices v in the run after some finite prefix. 


Remark 1. A similar statement to the equivalence of item [1] and item [2] has been 
proved in the work of Klarlund and Kozen [20], however, a reader familiar with their 
work might have observed some differences in the definition of a measure as well 
as a colourful tree. Our definition of colourful trees is more restrictive than theirs. 
For instance, colourful trees in the work of Klarlund and Kozen have no restrictions 
about the colours along a path in a tree, i.e, in their definition, the trees can have 
the same colour along a path, and in fact only a partial colouring is required. How- 
ever, an examination of their proof reveals that in the direction of the proof where 
they construct a Rabin measure, they inherently use a construction which produces 
a mapping into colourful trees as we have defined and therefore, it is enough to only 
consider such trees. We make this explicit and also prove Theorem[]in the appendix 
to suit our situation. 


4 Alifting algorithm 


In this section, we define Rabin game formally and first show how such Rabin games 
also have a notion of a Rabin measure. 
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Inspired by the breakthrough algorithms to solve parity games, Colcombet, Fi- 
jalkow, Gawrychowski, and Ohlmann [5] proposed a formalism for algorithms that 
solve games where one player has a positional strategy. They showed that if there is a 
special kind of graph homomorphism into a graph with a total order on its vertices, 
then one can obtain a lifting algorithm for such games. From their work [5] Theo- 
rem 3.1] combined with Theorem [4] we can show that Rabin measures defined in 
our previous section can also be used to provide a lifting algorithm for Rabin games. 
However, to make this work self-contained and to provide an explicit space-efficient 
algorithm using our non-trivial totally ordered set, we show how such lifting is per- 
formed step-by-step in this section. We believe our following section would help fu- 
ture implementation of such algorithms. 

A (co, C)-colourful Rabin game 6 consists of an arena which is a (co, C)-colourful 
Rabin graph 4 with vertices V, a start vertex vo € V, and a partition of V into V, and 
Ve, the vertices of two players, whom we call Controller and Environment, respec- 
tively. 

A positional strategy o for Controller over the game graph is a subset of edges 
outgoing from each of Controller's set of vertices Ve. We denote the graph restricted 
to a strategy o for the Controller by |; and it is defined as the Rabin graph over the 
same vertex set with a new edge relation which contains exactly the edges in o along 
with all the edges from all vertices belonging to Environment. 

The Rabin game 6 is winning for the Controller if and only if there exists a po- 
sitional strategy o for the Controller where, all infinite paths starting from vo in 4|; 
satisfy the Rabin condition. We describe an algorithm that identifies whether a Rabin 
game 6 is winning for Controller, using Rabin measures on graphs. 


Remark 2. We only consider strategies of the Controller that are positional, but this 
is enough from the results of Emerson and Jutla (13), which shows that the Con- 
troller always has a positional winning strategy in Rabin games ifthere is any winning 
strategy at all. 


Consistency in games. Consider a (co, C)-colourful Rabin game 6. Let u be a function 
from V, the vertices of the game graph to an L-labelled (co, C)-colourful tree Z. We 
simply extend the definition of consistency from graphs to games by defining a vertex 
to be consistent with respect to u in © if either it belongs to the Environment and all 
outgoing edges from it are consistent in «4 or if it belongs to the Controller and there 
is at least one outgoing edge that is consistent in 4. A map p from V to a Z! is 
a Rabin measure for a (co, C)-colourful Rabin game 6 if and only if all vertices are 
consistent with respect to 4. 


An overview of the algorithm. We describe an algorithm that identifies whether a 
Rabin game 6 is winning for Controller, using Rabin measures defined earlier for 
Rabin graphs. The basic principle in the algorithm is that given a colourful tree, the 
algorithm finds if there is a Rabin measure that maps vertices of the game into nodes 
of that tree. The algorithm does so by starting with the smallest map (all vertices are 
mapped to the root of this tree) and then at each step, if a vertex is not consistent, 
increase the value of this map just at this vertex which is not consistent. The value 
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is modified (increased) until either all vertices are consistent, or the value cannot be 
increased anymote. 

Toward our goal of formally defining this algorithm, we define monotonic, infla- 
tionary operators on the set of all maps from vertices of a game to a tree such that the 
simultaneous fixpoints of these operators exactly correspond to a Rabin measure. 

Consider a Rabin measure u which is a function mapping the vertices V of a 
(co, C)-colourful Rabin game © into an L-labelled (co, C)-colourful tree Z. We de- 
fine a function lift,, which maps edges E of the arena of the game to &!. Foran 
edge u — v of 4, we define lift, (u, v) to be the smallest element t in 2! such that 
(1) t= p(w) and (2) edge u — v is consistent with respect to the mapping plu := t], 
where we use the notation u[u := t] to indicate the mapping py’ where p(x) = u(x) if 
xzuandyjg'(x)- tif x- u. 

For each vertex v, we define an operator Lift, on the lattice of all maps from V 
to 2! . The operator Lift, only modifies an input map p at v and nowhere else. We 
define 

uu) foruzZv 
Lift, (u) (U) = 4 Minu, wez (lift;(v, w)} ifu=veE Ve 


max, weg {lift,(v, w)}  ifu- v€ Ve 


Proposition 1. The function Lift, is monotonic for each v. 


The above proposition follows from our definition of the Lift, function. Now that 
we know that each Lift, is inflationary and monotonic. Therefore, the simultaneous 
least fixpoint of Lift, on the map u, which maps all vertices to the root of Z exists 
(from the Knaster-Tarski theorem [31]). We can moreover state the following propo- 
sition that such fixpoints correspond to the Rabin measures, which almost follows 
from our definitions. 


Proposition 2. For a (co, C)-colourful Rabin game © where the vertex set is V and a 
fixed L-labelled (co, C) -colourful tree L, 


- any simultaneous fixpoint of the set of functions Lift, for all v € V is a Rabin 
measure; 
- any Rabin measure is a simultaneous fixpoint of Lift, for all v € V. 


Our algorithm, like any other progress-measure algorithm, computes a fixpoint 
and is described as follows. The correctness follows from Propositions 1]and[] 


Algorithm 1 The lifting algorithm on game (co, C)-colourful Rabin game 6 with ver- 
tices V to tree Z 
Require: For each v € V, u(v) is declared to be root in Z 

1: while there is some vertex v that is inconsistent with respect to u. do 

2: Ls Lift, (u). 

3: end while 

4: return u 
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Remark 3. If there is a (co, C)-colourful Rabin game 6 and an L-labelled (co, C)- 
colourful tree Z’, such that there is a Rabin measure p’ from V to Z’, and Z embeds 
£’', then there is also a Rabin measure p to Z such that all the elements that are not 
mapped to T by p are still not mapped to T by u. This map is obtained by composing 
uU! with the embedding of #’ into Z. 


Runtime. For a finer analysis of the runtime, we need to understand the size of the 
lattice where the lifting algorithm takes place. In this section however, we restrict 
ourselves to analysing the runtime of our algorithm for a fixed Z. We write |Z| to 
represent the number of nodes in the labelled tree Z. We write n to denote the num- 
ber of vertices in a Rabin game, m to denote the number of edges, and k = |Cu {co}| 
to denote the number of colours. 


Lemmal. Given a mapping from the vertices of a (co, C) -colourful Rabin game 6 
to an L-labelled (co, C) -colourful tree L, the value of Lift, (u) (v) can be computed in 
time O (deg(v)- Tnext), where deg(v) is the degree (number of outgoing edges) of v and 
Tnex is defined as the maximum of the time taken to 


- make a linear pass on a node in €? (assuming the node is represented by a se- 
quence of elements of L x C), 

- compute the next node in L, and 

- find the next node that uses colours only from C' U {L} for a given node t € Z and 
subset of colours C' € C such that colour(t) € C'. 


Proof (sketch). The proof of the above lemma reduces to arguing carefully that using 
these above items as subroutines, we can find the node larger than Lift, (u) (v) in the 
given tree that satisfies the conditions B along with at least one of G, or G,. To satisfy 
condition B, we need to find the next node that does not use a bad colour of v (using 
item 3 of the lemma) and then find its first child larger than the current node value 
of u(v) that either satisfies G» or G; using the operations described above. The exact 
details of how the last step are done are provided in the full version of the paper. 


Theorem2. For a (co, C)-colourful Rabin game © with n vertices and m edges, and 
an L-labelled (co, C) -colourful tree L£, Algorithm|1| on (6, L) returns the smallest Ra- 
bin measure to £" in time O(m|.Z Tae) where Tnext is as defined in Lemma| 1| and 
|.Z| denotes the number of nodes in L. 


Proof. First, we observe that performing Lift, on the mapping strictly increases the 
mapping for a vertex that is not consistent. Each operation of Lift, also calls at most 
deg(v) many calls of lift, (v, u) for some edge v — u. Suppose each operation lift, (v) 
takes time Thext, to find the value of Lift, (uj) (v) takes time at most deg(v): Tnext. Since 
each non-trivial application of Lift, strictly increases the value that v is mapped to, 
it can be called at most as many times as the number of nodes in tree Y, this ensures 
that the time taken is 


` deg(v)|-Z| (Tnexc) € O(m| LI Thext) 
veV 


where m denotes the number of edges. 
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5 Small Colourful Universal Trees 


In the previous section, we concluded that our algorithm identifies correctly the 
smallest Rabin measure into a fixed labelled colourful tree Z. However, from The- 
orem[1] there exists a Rabin measure into an L-labelled (co, C)-colourful tree Y with 
at most n leaves. Observe that we only need to consider n leaves of £ which corre- 
spond exactly to the image of the Rabin measure. Therefore, for a Rabin game, there 
is a Rabin measure into Z7! where all start vertices from which the game is winning 
for Controller are not mapped to T. In order for the algorithm to successfully deter- 
mine the winner of all (co, C)-colourful Rabin games with n vertices, we need to en- 
sure that the tree Z used in Algorithm[1]would be able to embed all (co, C)-colourful 
trees with n leaves. Since the runtime is linearly dependent on the tree size, smaller 
trees that satisfy the above property are desirable. 

We now show that we can obtain colourful universal trees, i.e., colourful trees that 
are large enough to embed any (co, C)-colourful Z with n-nodes. We also modify 
the technique of succinct universal trees of Jurdzinski and Lazić [18] to encode each 
node of these colourful universal trees using polynomial space, which helps navigate 
these labelled colourful trees efficiently. 


Colourful universal trees. A (co, C)-colourful tree Y is n-universal, if it embeds any 
(co, C)-colourful tree 7 with at most n leaves. We henceforth assume that the set C 
consists exactly of the colours c;,..., Cp, with the fixed ordering cı < c? < ++ < c, on 
the colours, and use k to denote h+ 1. 

A naive attempt at constructing an n-universal (co, C)-colourful tree could be to 
take all possible (co, C)-colourful trees with at most n leaves with the root colour 
co and concatenate them. Clearly, such an n-universal (co, C)-colourful tree can be 
created as there are only finitely many such trees up to isomorphism (for a fixed C 
and n). But of course, this tree is not only large, but can also be difficult to navigate. 
A more tractable attempt is to construct a tree that branches n -h many times at the 
root. The subtrees at the root that occur from this n: h branching have n repetitions 
of the h colours cj, c2, ..., Ch, in that order. Each of the children in-turn branch into 
n-(h—1) many times similarly, thus creating a tree of size bounded by n"h!. We claim 
that indeed such a tree was exactly the one underlying the algorithm of Piterman and 
Pnueli [27], which led to their O(mn** kk!) algorithm. 

Below, we give a more involved construction of a significantly smaller universal 
tree. In our construction, we inductively describe such a (co, C)-colourful n-universal 
tree, which we call UE, for a fixed n < 2f. 


- if C = ø, then there is exactly one tree to embed, and therefore 


Unc) = (eo (050^ )) 


— if / = 0, then the tree to be embedded has exactly one leaf and therefore, for each 
colour c; in C, we have a child of colour c; which hosts a subtree whose colour at 
the root is c;. This is defined inductively as 


Qui velis C TA oan EE] 
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£ [4 £ 
(@,{@,0}) (@,{@,0}) (@,{@,@}) 


Fig. 4: Inductive construction of a smaller colourful n-universal tree 


where C; is C \ {c;}. 

— if C Z à and £ > 0, then we define the coloured tree to be two copies of an n/2- 
universal tree, and h many copies of the n-universal tree where one colour is 
dropped each time. More formally, 


£ _ol-l £ £ £-1 
Asc) = P ec) (co, CU capsid eas (L, 0))) C) 
In Fig.|4| we demonstrate how the inductive construction is done if cy = e and the 
set of colours is C = {e,e, »}. To the left and right are the (e, C)-colorful n/2-universal 
trees and between them, there are |C| many n-universal trees each of which uses one 


fewer colour and one node with just the dummy colour represented there by o. 


Theorem 3. For C Z Ø, and k =|C|+1, ee constructed is a (co, C)-colourful n- 


universal tree with at most 
+k 
nk! [in a | ^ | i 
k-1 


many leaves, where £ = [logn]. 


Proof. Firstly, we need to show that 97 A c; 18 (co, C) -colourful n-universal tree. Then, 


we prove using induction that 97 a C) 
+k 


it also has at most ( na -2°. k! leaves, leading to the proof of our theorem. 


has at most 2* - k!- 44 leaves and later show that 


In fact, we have a lower bound for n-universal (co, C)-colourful trees, which is 
within a polynomial factor of the upper bound. 

It is known from the work of Calude et al., as well as from Casares et al. [1I2] that 
there are no algorithms that solve Rabin games in time n°” . 29*1089, But observe 
that this does not exclude algorithms which is dependant on K! by only a constant 
smaller than 1 in the exponent. We have improved the current state-of-the art from 
2+ o(1) to 1+ o(1) in the exponent. A natural question to ask would be if the k! com- 
ponent can be reduced further. We show below that we cannot improve our running 
time much further using our techniques. 
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Lemma 2 (Lower bound). Any n-universal (co, C)-colourful tree must have size at 
least (H k- 1)! where k = |C| + 1 and £ = |logn]. 


Proof. Fix a permutation c;,,...,c;, of the colours in C and consider any tree with 
n leaves where the order of colours from the root to the leaf is exactly the same as 
the given permutation. Moreover, we assume that the leaves all have the same depth 
from the root. This tree must have size at least the size of a 2°-universal tree (defined 
for ordered tree without colours). Such universal trees have size at least [2 inthe 
work of Czerwiński et. al [6]. For each choice of permutation, the universal tree re- 
stricted to that permutation must have size | Furthermore, two universal trees 
obtained by fixing different permutations cannot share a leaf since distinct colours 
are assigned to some ancestor of such leafs. Therefore, we obtain a lower bound of 
[ng h! on the size of any (co, C)-colourful n-universal trees. 


This immediately gives us the bound (^7 — 1)! for k = |C| + 1. Our lower 
bound also matches one of the upper bounds of our construction up to a polyno- 
mial factor in n and k. 


Labelling Colourful Universal Trees. Here, we give a labelling of a universal colour- 
ful tree described in the previous section by giving an W-labelling of any (co, C)- 
colourful tree where the set W = {0,1}*. We let e denote the empty string in {0,1}*. 
We define the ordering on {0,1}* as follows, similar to the succinct encoding of or- 
dered trees [18]: 0 < € < 1 and for bi, b» € {0,1} we have bi - w) < bo: we if and only if 
bi « be or bi = be and W| < Wo. 

Any node t in a W-labelled (co, C)-colourful tree can be represented by a word 
generated by the following regular expression 


(0, 1}* ci, - (0, 1}* ci, -...- (0, 1" ci, 


where ci, 7 ci, if j A k and cj; = L if and only if j = m. We call the number of 0s and 
1s occurring in the word, the number of bits used to label t. We show in the following 
lemma that it is possible to have a labelling of our colourful universal tree DM C) 
such that the labelling of each node in it is ‘short’. 


Lemma3. There is a W-labelling of the tree qu. cy denoted by LE such that the 
number of bits used to label any node of LE is at most £. 

£ — 9L- e £— 
Proof (sketch). For  » 0, we have A eC) - a£ (eo, Cane m utes) e. 
ur and append the bit 0 for the copy on the 


left and append with 1 for the copy on the right. For all the labellings of es we 
add the element £- cj as a prefix. 


We obtain recursively a labelling of 47 


We rigorously prove this in the full version of the paper, but only state here that 
the three operations defined in the statement of Lemma[i]can be computed in time 
O(k@log k) (denoted by Thext), where k = |C] + 1. 
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Theorem 4. Finding the winner in a (co, C) -colourful Rabin game with n vertices, m 
edges, and k = |C| +1, takes time 


0 mnie min n2F, ui j i 


and O(nklog klog n) space. 


Proof. We know that the lifting Algorithm|1|for a (co, C)-colourful tree finds the Rabin 
measure into the tree Z in time O(m|P| Text) from Theorem] All that remains is 
to plug in the values of the size of the universal tree obtained from Theorem{3| For 
a game with n vertices, we instantiate the algorithm with Z being the W-labelling 
of the (co, C)-colourful 2^ -universal tree LIN C) constructed, where £ = [log(n)]. The 


tree therefore has at most (nk! min { n2*, as dde 


and hence at most k times as many nodes, since each node has at most k ancestors. 
Moreover, the time taken to navigate the tree Thext is at most O(k£log k). The space 
required by the algorithm at each step is just the space required to store the map. To 
store a map, we need to store a node in the tree for each vertex. But from Lemmaj3| 
storing each node only requires us to store a sequence of the k colours and at most 
logn bits. Since to store these k colours, we need klogk bits and, the total space 
complexity is O(klog klogn) for each of the n vertices, giving us the desired space 
complexity. 


)}) many leaves from Theorem 


6 Conclusions, Discussion and Future Work 


We have shown an algorithm for Rabin games that requires almost quadratic space 
and takes time that is polynomial in n and (Kk!)!* , Significantly more asymptotic 
improvement to the running time may be difficult, as it was shown in the work of 
Calude et al. that there are no algorithms to solve Rabin games (as well as Muller 
games) in time 590) . 2*(*1089 unless the Exponential Time Hypothesis fails (infor- 
mally, it is the assumption that 3-SAT has no sub-exponential algorithms). However, 
improvements in the exponents of the parameter k!, which contributes to the major- 
ity ofthe running time would prove useful in any algorithm that solves Rabin games. 
We have shown that using colourful universal trees cannot provide a significant im- 
provement bound because of the k!!*° Jowerbound on the size of such a tree. How- 
ever, any technique that improves, even on a few targeted cases, this 1 + o(1) bound 
could lead to faster algorithms. For instance, the recent unpublished work of Liang, 
Khoussainov, and Xiao [24] improve the running time for specific values of k, where 
the size of k is large (comparable to n). 

While we focus on the theoretical advance in this paper, an obvious future direc- 
tion is to implement the algorithm. There are tools that convert LTL specifications to 
Rabin automata—such as Rabinizer 4 [22]. It will be interesting to see if solving the 
obtained Rabin games using our algorithms outperforms converting them instead 
to parity games and then using state-of-the-art parity game solvers such as Oink 
framework. We believe improvement in state space of solving Rabin games through 
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our paper might lead to more efficient algorithms for the problem of reactive synthe- 
sis of LTL formulas. 

Our algorithm, like other progress measure algorithms, can display worst-case 
behaviour in certain asymmetric examples. To show a vertex is losing for Controller, 
the measure needs to increase until it reaches T. This lack of symmetric treatment 
of the players by our algorithm might lead to worst case behaviour on several ex- 
amples. But circumventing this problem by constructing similar measures for Envi- 
ronment in the hopes of finding a symmetric algorithm is not as straightforward, as 
Environment does not have a positional strategy in this game. 

In a different direction, symbolic algorithms for parity games are either implicitly 
or explicitly guided by universal trees [3[19] constructed for both players. We believe 
with some effort, our small colourful universal trees can be exploited to make sym- 
bolic algorithms to solve Rabin games. One such algorithm would look like an asym- 
metric variation of the universal algorithm in the work of Jurdzinski, Morvan, and 
Thejaswini for parity games, combined with our construction of colourful uni- 
versal trees. Indeed, we already have a definition of colourful decompositions which 
one might hope to obtain as an end-result of such a recursive symbolic algorithm. 
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Abstract. State reachability for finite state concurrent programs run- 
ning under Release-Acquire (RA) semantics is known to be undecidable, 
while under a weaker variant, called Weak-Release-Acquire (WRA), the 
problem is decidable. However, WRA allows many counterintuitive be- 
haviors not allowed under RA, in which threads locally oscillate between 
observed values. We propose a strengthening of WRA in the form of 
a new memory model, which we call Localized Release-Acquire (LRA), 
that prunes these oscillatory behaviors. We provide semantics for LRA 
and show that verification under LRA is decidable by extending the 
potential-based technique used to prove decidability under WRA. The 
LRA model is still weaker than RA, and thus our results can be used to 
soundly verify programs under RA. 


Keywords: Relaxed Memory Concurrency - State Reachability - Release- 
Acquire Semantics 


1 Introduction 


The Release- Acquire memory model (RA), a prominent fragment of the C/C++ 
shared-memory concurrency specifications from 2011 3] ep 7]g7], has recently 
gained a lot of attention (see, e.g., 23}/25|/30|). For programmers, RA 
combines the essential guarantees of coherence (a.k.a. “sequential consistency 
per-location") and causal consistency [10|20], which enable the implementation 
of various concurrent algorithms and synchronization mechanisms with very few 
barriers. For implementors, RA is weaker than the Total Store Order model 
(TSO) [29] B1], which enables efficient mapping of memory accesses to Intel's 
x86 processors. Moreover, unlike TSO, RA is *monotone" [33|, which, roughly 
speaking, means that replacing parallel composition with sequential composition 
can never introduce additional behaviors [26]. 

Unfortunately, the fundamental problem of state reachability in finite-state 
concurrent programs running under RA was recently shown to be undecidable Bl. 
This is in contrast with state reachability assuming the well-known model of 
sequential consistency (SC) [28]. which amounts to standard reachability in a 
finite state system, as well as with state reachability assuming TSO, which was 
shown to be decidable using the framework of well-structured transition 
systems (WSTS) 1] [L3]. More recently, decidability of state reachability was 
© The Author(s) 2024 
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established for two variants of RA mU called Strong Release-Acquire (SRA) 
and Weak Release-Acquire (WRA), which bound RA from above (every behavior 
allowed by SRA is allowed by RA) and below (every behavior allowed by RA is 
allowed by WRA). In particular, verification under WRA can be used to obtain 
sound (but incomplete) verification under RA, since any buggy program under 
RA is also buggy under WRA. The gap, however, between WRA and RA includes 
some dubious behaviors: 


Example 1. 'The annotated behaviors in the three litmus tests below are allowed 
by WRA but disallowed by RA: 


(Oscillation 1) (Oscillation 2) (Oscillation 3) 
xc a:—x //1 a:—y /1 21 
x:—2 b:=x /2 x:=2 | b:—x /2| x:=1 x:=2 | bi=x /2 pet 
eX 71 eu ud c:=x /1 i 


Intuitively speaking, a thread in WRA can “change its mind” about the order of 
concurrent writes. In RA, every shared variable is governed by a “modification 
order" which dictates the (globally agreed upon) order of concurrent writes, and 
reads have to respect that order. 


In this paper, we aim to narrow the gap between models with decidable 
reachability problem and RA by providing a model that lies between WRA and 
RA and still allows for decidable verification. More concretely, we propose to 
strengthen WRA in a way that eliminates the above oscillatory behaviors, while 
still (1) being weaker than RA and (2) inducing a decidable state reachability 
problem. The proposed model, which we call Localized Release-Acquire (LRA), 
is obtained by adding one constraint (a.k.a. axiom) to WRA’s declarative consis- 
tency predicate. In turn, decidability is established similarly to bal, by carefully 
designing an operational “lossy” semantics based on maintaining thread poten- 
tials, so that it fits well in the framework of WSTS, and it is equivalent to LRA. 
Our proof establishes the equivalence of the lossy potential-based system with 
LRA using forward simulation in one direction and backward simulation in the 
converse. 

The full version of this paper available in contains detailed proofs for the 
claims of the paper. 


2 Preliminaries 


In this section we present the formal preliminaries for our results, including 
the representation of concurrent programs, memory systems, and declarative 
execution graphs. We employ the following finite domains (and metavariables 
ranging over them): 
thread identifiers t,t € Tid = (T1, To, ...] 
variables x,y € Loc = (x, y, ...] 
values v € Val £ (0,1,2,...) 
We represent concurrent programs as labeled transition systems. A labeled tran- 
sition system (LTS, for short) A over an alphabet X is a triple (Q, Qo, T), where 
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Q is a set of states, Qo C Q is the set of initial states, and T C Q x X x Q is 
a set of transitions. We denote by A.Q, A.Qo, and A.T the three components of 
an LTS A; we write 254 for the relation ((q,q') | (q,o,q') € A.T} and >, for 
Uses Z, 4. A state q € A.Q is reachable in A if qo —'4 q for some go € A.Qo. A 


sequence 901,...,04 is a trace of A if qo Sio qı Z3 4 -n1 zu i qn for some 
qo € A.Qo and qı, -qn € A.Q. 

For brevity, we elide the definition of how concurrent programs in a pro- 
gramming language are interpreted as LTSs (see for such definition), but 
only note that these LTSs are finite-state and they employ labels (a.k.a. “pro- 
gram transition labels") from the set ProgLab £ Tid x (Lab U {e}), where Lab 
denotes the set of action labels, representing interactions that a program may 
have with the memory, and e denotes a thread-internal transition. Action labels 
l € Lab take one of the following forms: a read R(x, vg), a write W(x, vy), or a read- 
modify-write RMW(z, vg, vy), where z € Loc and vg, vy, € Val. The functions typ, 
loc, valg, and val, respectively retrieve (when applicable) the type (R/W/RMW), 
variable (x), read value (vg), and written value (vy) of an action label. Further- 
more, for a program transition label œ € ProgLab, the functions tid and lab 
respectively retrieve the thread identifier (T) and the action label (or €) of a, 
and the functions on action labels (typ, 1oc, ...) are lifted to program transition 
labels in the obvious way. 

'To represent concurrent programs running under a particular memory model, 
we synchronize the transitions of a program Pr with a memory system. A mem- 
ory system is another LTS M (but, possibly infinite-state) whose set of transition 
labels consists of non-silent program transition labels (elements of Tid x Lab) as 
well as a (disjoint) set Mf. of memory-internal actions. Then, the composi- 
tion of a program Pr and a memory system M, denoted by Pra M, is the LTS 
whose transition labels are the elements of ProgLab U M.O; states are pairs 
(p, M) € Pr.Q x M.Q; initial state is (pj; /Vf.Qo); and transitions are given by: 


a € Tid x Lab a € Tid x (e) a € M.O 
P Spr P M 5,4 M' P Spr pP M 2,4 M' 
(p, M) D pram (p, M^) (p, M) D pruM (p, M) (p, M) D Pre (p, M^) 


The state reachability problem for a memory system M receives as input a 
program Pr and a state p € Pr.Q and asks whether (p, M) is reachable in Pra M 
for some M € M.Q. 


Finally, we also need the notion of a declarative memory model, which ac- 
cepts/rejects program behaviors based on constraints on the generated execution 
graphs. 


Definition 1. An execution graph G is a pair (E, rf), where: 


— E is a finite set of events. An event e is a tuple (7, 5,1), where 7 € Tid, 
called the event’s thread identifier; s € N, called the event’s serial identifier, 
and l € Lab, called the event’s label. The functions tid, sn, and lab return 
the thread identifier (7), identifier (s), and action label (I) of an event. All 
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functions on action labels (typ, loc, ...) are lifted to events in the obvious 
way. We denote by E the set of all events, and define the following subsets: 


R£ {ec E|typ(e) e (,RMW)) W#4{ecE|typ(e) € {W,RMW}} 
RMWÊRAW E ={ecE]|tid(e)=7} 


— rf is a reads-from relation for E, that is a relation on E satisfying: 
e If (w,r) € rf, then w € W and r € R. 
e If (w,r) € rf, then loc(w) = 1oc(r) and valy(w) = vala(r). 
W 1 = ws whenever (w;,r),(wo,r) € rf (each read reads from at most 
one write). 
e For every r € ENR, there exists some w € E such that (w,r) € rf (each 
read reads from some write). 


We denote the components of G by G.E and G.rf. For any set E/ C E, we 
write G.E' for G.EN E' (e.g., GW = G.E N W). The program order induced 
by an execution graph G, denoted by G.po, is defined as G.po = ((e1,e3) € 
E x E | sn(e1) < sn(e2) ^ tid(e1) = tid(es)]. 


Given a set E of events, r € Tid, and l € Lab, NextEvent( E, 7,1) denotes 
the event with thread identifier 7, label /, and a minimal fresh serial identifier 
w.r.t. E, i.e., NextEvent(E, 7, D) 5 (7,s,l), where s = min(n € N | (7,n,l) € E]. 


Definition 2. An execution graph G is generated by a program Pr with final 
state p € Pr.Q if (pg, Go) —" (p, G) for some p, € Pr.Qo, where Go denotes the 
empty execution graph (given by Go = (0,0)) and — is defined by: 


P 2h pr P E' = EU {NextEvent(E,7,1)} rf € rf’ 


(E', rf’) is an execution graph P25 pr p 
(p, (E, rf)) > (P, U^) (P,G) => (P,G) 


Using the above definitions, a declarative memory model can be identified 
with a set of so-called consistent execution graphs, and a program state p is 
'emphreachable under a declarative memory model if some consistent execution 
graph G is generated by Pr with final state p. 


3 The Localized Release-Acquire Model 


In this section we introduce the Localized Release-Acquire (LRA) model, start- 
ing with its declarative presentation. LRA is obtained by adding a single con- 
straint, called RA ne D ru to WRA. We first briefly repeat the three 
constraints of WRA (see for more details). Figure [1] summarizes the four 
constraints of LRA. 
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hb hb \ rf 
py" rf ov rf, rf rf ^, "n 
E b a E! 
E R RMW; RMW, 


weak-coherence weak-atomicity local-read-coherence 


Fig. 1. Illustration of forbidden patterns in LRA 


Notation for relations. Given a relation R, dom(R) denotes its domain; R? and 
R* denote its reflexive and transitive closures; and R~! denotes its inverse. The 
(left) composition of relations R,, Rz is denoted by R, ; Ry. We denote by [A] 
the identity relation on a set A (e.g., [A]; R; [B] = RA (A x B)). 

First, we need a derived "happens-before" relation. For a given execution 
graph G, we define G.hb = (G.po U G.rf)*. We require that G.hb is a partial 
order, which results in our first constraint: 


G.hb is irreflexive (irr-hb) 


The next constraint intuitively makes sure that “a thread cannot read a 
value when it is aware of a later value written to the same location”, where 
“aware” and "later" are interpreted using G.hb. Formally, we define G.hb|ioc I 
((e1,e2) € G.hb | 1oc(e1) = loc(ez)} (i.e., per-location restriction of the happens- 
before relation), and require the following: 


G-hb[ioc ; [W] ; G-hb ; G.r£ ^! is irreflexive (weak-coherence) 


In particular, the following annotated outcome of the message-passing (MP) 
test is forbidden: 


(MP) E 
n a:—y /1 W(x, 1) is R(y, 1) 
=1 | sz 70 MN 
me W(y,1) R0) 


An execution graph justifying this outcome must have rf-edges as depicted 
above. However, we have hb|1s. from W(x, 0) to W(x, 1), hb from W(x, 1) to R(x, 0), 
and rf from W(x, 0) to R(x, 0), which is forbidden by [weak-coherence] 

The final condition that comes from WRA ensures that distinct RMW events 
never read from the same write event: 


V(w1, 1), (W2,e2) € G.r£ ; [RMW]. wi = w2 = > e1— e»  (weak-atomicity) 


This concludes the consistency constraints of WRA. As noted above, unlike 
RA, WRA admits behaviors in which threads oscillate between values that were 
concurrently written to the same location. Our proposed condition of LRA that 
prunes these behaviors is the following: 


(G.hb[;s. \ G.rf) ; [R] ; G.hb ; G.rf l. is irreflexive — (local-read-coherence) 
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Intuitively, this constraint ensures that a thread cannot read from a certain 
write w if it is already aware of a read r’ reading from the same location that is 
later than w and reads from some other write w’. Again, “aware” and “later” are 
interpreted using G.hb. 

The following examples demonstrate “oscillations” between observed values 
that are allowed in WRA but forbidden in LRA. 


(Oscillation 1) (Oscillation 2) (Oscillation 3) 
x:=1 a:=x /1 a:=y /1 21 
x:—2 b:=x /2 |\x:=2 || bi=x /2| x:-1|x:-2 | bi=x /2 u 
cui 4i c:=x Mi c:— x /1 ~~ 
T; T2 Tı T2 T3 Ti Ta Ts 
hb’ 
W(x, 1) -~ R(x, 1) rf R(y, 1) 
` z wf W(x,1) 
f \ f E f pe 
W(x 2) a > R(x, 2) À rf W(x 2) En $ R(x 2) - E 1) W(x, 2) E > R(x, 2) E Wy, 1) 
R(x, 1) < | R(x,1) TË R(x, 1) 
T T T T 
(Order-Propagation) i g 5 : 
a:=x //1 R(xl). rf 
- i Guy E zs rf NER 
pim a dd c:=x /1]|7^7— l WG 2) 75 R(x, 2) rf RI) VD 


W(y, 1) a R(x,1) rf 


It can be checked that |local-read-coherence| forbids these execution graphs: 


in all of them we have (1) G.hb|isc V G.rf from W(x, 1) to R(x,2); (2) G.hb from 
R(x, 2) to the read R(x, 1) that represents the read to c; and (3) rf from W(x, 1) 
to that read. 

Next, we establish the relation between LRA and RA (see for a definition 
of RA). 


Proposition 1. LRA is weaker than RA, that is: if a program state is reachable 
under RA, then it is also reachable under LRA. 


Proof. We establish this result by recalling the following *[read-coherence] con- 
sistency constraint of RA (see Figure [2] and for more details). Note the 


use of modification order G.mo in RA to interpret one write being "later" than 
another, in the place of G.hb|1s. in the {weak-coherencef in WRA. Here G.mo is 
disjoint union of relations {G.moz}+eLoc where each G.mo, is a strict total order 
on Wz. 


G.mo ; G.hb ; G.rf~" is irreflexive (read-coherence) 


Since WRA is strictly weaker than RA, it suffices to show that the additional 
constraint ?local-read-coherence| of LRA is also guaranteed in RA. The proof 
follows by contradiction. Assume otherwise, hence, for a given x € Loc, we have 
w, w' € Wz and r,r’ € Ry where (w,r’) € hb\ r£, (w',r') € r£, (w,r) € rf, and 
w #w’ (see right side of Figure D]. Since loc(w) = x = loc(w’), due to the RA 
semantics, we have one of the following cases: 
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- hb \ rf ee 
We asked > Wz w —— r! <4 w' 
ms BiU. |» 
`a 
T 
(w, w’) € mo; V (w', w) € mos 


Fig. 2. Axiom [read-coherence] in RA and illustration for proof of Proposition 


— (w,w’) € nos: In this case we have (w,r) € rf while (w,r) € mo, ; hb, which 
contradicts the axiom [read-coherence] of RA. 
— (w',w) € mos: In this case we have (w’,r’) € rf while (w’,r’) € mo, ; hb, 


which again contradicts the axiom [read-coherence]of RA. 


To see that LRA is strictly weaker than RA, we note that LRA does not 
provide full coherence. Indeed, as the next example shows, even programs with 
a single shared variable can exhibit weak behaviors: 


Ti T2 
(WW) w(x, 2) w(x, 1) 
x:—2 xi a 
a:—x Ji b:—x /2 R(x, 1) R(x, 2) 


Interestingly, our final example shows the LRA model is possibly blocking: it 
may be the case that a thread simply cannot read from a certain location, since 


any option for reading would violate |local-read-coherence 


Ti T2 
(Blocking) W2) Lg WOLD 
x:—2 x:= 1 ux e 
a:—x /1 b:=x /2 R(x, 1) R(x, 2) 
z:-1 c2 71 
d:=x /nothing can be read (41 - rf AED 


Roughly speaking, the synchronization on z “joins” the threads and rules out 
both options. More formally, if the final read reads from W(x,1), we violate 
[local-read-coherence] due to G.hb|is; V G.rf from W(x,1) to R(x,2) and G.hb 
from R(x,2) to the final read. In turn, if the final read reads from W(x,2), we 
puce to G.hblicc V G.rf from W(x,2) to R(x, 1) and 
G.hb from R(x, 1) to the final read. 

It is important to note that the blocking aspect of LRA model does not affect 
the benefits of sound verification of the RA programs using LRA, since (due to 


Proposition |1| fi) forbidden outcomes in LRA model (possibly due to a blocked 
run) are also forbidden in the RA model. 
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3.1 An Operational Presentation 


Since LRA-consistency is “prefix-closed”, it is straightforward to “operational- 
ize” LRA’s declarative presentation, which will help us below in relating the 
potential-model to LRA. To do so, we define a memory system, called opLRA, 
whose states are execution graphs, the only initial state is the empty execution 
graph, and the transitions are as follows: 


WRITE 
e = NextEvent(G.E, T, W(x, uw)) 
G' =(G.EU (e), G.rf) 

G TW(m,vy) 


/ 
opLRA G 


READ /RMW 
[= R(x, ur) V l= RMW(z, vr, Uy) 
e = NextEvent(G.E, 7, l) G' = (G.EU {e}, G-rf U ((w, e)}) 
wcG.W; valy(w) = va 
w £ dom(G.hb|sc ; [W] ; G.hb' ; [E"]) 
w ¢ dom((G-hb|isc V G.r£) ; [R] ; G-hb! ; [E7]) 
typ(I) = RMW = > w ¢ dom(G.rf ; [RMW]) 


T;l 1 
G —>opLRA G 


These transitions are enforcing consistency on every step, which allows us to 
establish the following relation. 


Proposition 2. LRA is equivalent to opLRA, that is: a program state is reach- 
able under LRA iff it is reachable under opLRA. 


4 Lossy semantics for LRA 


In this section, we present loLRA, a potential-based memory system that is 
equivalent to LRA and well suited for verification in the framework of WSTS. 

The memory states of loLRA maintain a collection of "read/write-option" 
lists for each thread, called the potential of the thread. Concretely, a state of 
loLRA is a potential mapping B which maps each thread 7 € Tid to its potential 
B(7). Potentials are finite sets of option lists, where each option list stands for 
a sequence of possible future reads (read options) and writes (write options) 
that ascribe possible operations the thread may perform in the order it may 
perform them. For instance, a list o1 - o9 consisting of two read options, o1 and 
02, allows the thread to read val(oi) from location 1oc(o1) and then val(o2) 
from location 1oc(o3). Thread potentials are explicitly “lossy”—a thread can non- 
deterministically lose whatever parts of its potential at any point. Initially, the 
loLRA memory system non-deterministically starts in a state where all potentials 
consist solely of write options. 

Next, we present the full definitions (which, except for loLRA’s transitions 
match precisely the definitions of the corresponding system for WRA in 22]). 
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Notation for sequences. We use € to denote the empty sequence. The length of a 
sequence s is denoted by |s| (in particular |e| = 0). We often identify a sequence 
s over X with its underlying function in {1,...,|s|} — X, and write s(k) for the 
symbol at position 1 € k < |s| in s. We write o € s if the symbol o appears 
in s, that is if s(k) = ø for some 1 € k < |s|. We use “-” for the concatenation 
of sequences, and lift it to concatenation of sets Sı and S» of sequences in the 
obvious way (S1 -S2 {s1 < s2 | s1 € $1,529 € S5]). We identify symbols with 
sequences of length 1 or their singletons when needed (e.g., in expressions like 
c: S for c € X and a set S of sequences over X). 


Definition 3. Options, option lists, potentials, and potential mappings are de- 
fined as follows: 


1. An option o is either (T,x,v,7gw)) (read option) or Oy(x) (write option), 
where T, mruw € Tid, x € Loc, and v € Val. The functions typ, tid, loc, val, 
and rmw-tid return (when applicable) the type (R/W), thread identifier (7), 
location (x), value (v), and RMW thread identifier (mruw) of a given option. 

2. An option list L is a finite sequence of (read or write) options. For a given 
option list L, we define loc(L) £ (1oc(o) | o € L}. 

3. A potential B is a finite non-empty set of option lists. 

4. A potential mapping B is a function assigning a potential to every 7 € Tid. 


We define a (well quasi) ordering on option lists that naturally extends to 
potentials and to potential mappings. 


Definition 4. The (overloaded) relation C is defined by: 


1. on option lists: L C L’ if L is a (not necessarily contiguous) subsequence of 
Ts 
2. on potentials: B C B’ if VL € B. 3I/ € B'. LCL’ (a.k.a. “Hoare ordering"); 


3. on potential mappings: 6 C B' if B(T) E B'(r) for every 7 € Tid (compo- 
nentwise order). 


'The memory system loLRA is formally defined as follows. 
Definition 5. The memory system loLRA is defined by: 


— loLRA.Q is the set of potential mappings. 
— loLRA.Q = (B | Vr € Tid, L € B(r),o € L.typ(o) = W}. 
— The transitions of loLRA are given in Figure 


The transitions of loLRA are informally understood as follows: 


— READ: For a thread 7 to read v from 2, all lists of 7 should start with an 
option o with val(o) = v and loc(o) = z (since it is the same option o in 
the head of all lists, all lists of 7 also start with the same thread identifier, 
which is important for the equivalence result; see Example 5.5]). The 
read step consumes these options by discarding the first element from each 
of 7’s lists. 
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WRITE 


o= (T, T, W, Tw) 


Yr € Tid, L’ € B'(«). 


(m=7 = 0O(z)-L'eB(r)^(xzm = L/ecB(n)) v 
AnS ibo cds. 


L' = Lo- (o- L1) + (o* L2) -...: (0- Ln) A 

Ow(x) - (Lai -... Ln-1) + Ou(x) -Ln € B(T) A 

(w= 7 => W(x); Lo-.. Dn-1- Ow(x) Ln € B(T) Ax d 1oc(Lo -...: In-1)) A 
(wn Ar => Loses In-1-Ou(x)- Ln € B(T) ^x d 1oc(La -.... In-1))) 


T,W(z,vy) " 
B —— —— ———ÀloLRA B 


READ RMW 
loc(o) =a loc(o) =x val(o) = Up rmw-tid(o) = T 
val(o) = Up B = Buia[r > 0- Bmia(T)] POWER 
B = B'[r = o- B'()] Bia ML E B' B'CB 
B T,R(z,vg) TUN B B T,RMW(c,vR,vw) BEEN B B JIRA B' 


Fig. 3. Transitions of lbLRA memory system 


— WRITE: For a thread 7 to write v to z, an option Oy(x) must be the first 


in each of 7's lists. The WRITE consumes these options, discarding the first 
element from each of 7's lists. 'To allow future reads from the executed write, 
the write may add a read option o with loc(o) = x, val(o) = v, tid(o) = 
7, and some rmw-tid(o) (possibly multiple times) in every existing list of 
every thread (including the writer itself). The WRITE step enforces carefully 
tailored conditions on where these new options are added: 

1. In the potential of the writer itself, a new option cannot be added after an 
existing write option to z (except for the write option that is consumed 
in this write step) and the last added read option should immediately 
precede an existing write option to zx. 

2. In the potential of other threads the last added read option should im- 
mediately precede an existing write option to x that is to be consumed 
by the current write step. 

3. If more than one option is added, the added read options can never 
“surround” an existing read/write option with location z. 

4. New read options can be placed in a list L only if the suffix of L after 
the first occurrence of the newly added read options are present as an 
option list of the writing thread 7. 

RMW: The only additional requirement when performing an RMW compared 
to a non-interrupted execution of a read followed by a write is that two 
RMWs should never read from the same event. This is achieved by including 
RMW thread identifiers in read options, denoting the (unique) thread that 
may consume this option when executing an RMW. When a thread writes, 
it picks an (arbitrary) unique thread identifier (mrw) for its added options; 
reads ignore this field; and RMWSs by thread 7 can only consume read options 
whose RMW thread identifier is T. 
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Ti To Ti To 
W(x, 0 / 
( | = Lo Ou(x) Lo W(x, 0) L 
We) ` R(y,1) BNO T 
| 2 | Oy) ^x Oy) 
£7 3 Ow(x) Ox,0 
W(y,1) R(x, 0) 
1(a) 2(a) 
Ti To Ti To Ti To 
77 W(x, 0) W(x, 0) LU W(x, 0) 
u(x) ` OW) Wat) %0) WE yga nt 
Nw Cu Ow(y) | uisu | 
^ -» 02,0 Oz,0 Ox,0 = Oz,0 
2(b) 3(a) 4(a) 


Fig. 4. This figure shows the loLRA transitions for MP program. Here the dashed line 
in 1(a) between Oy(a) of Tı and Oy(x) of T2 indicates that a future write W(x,0) of Ti 
(see 2(a)) may replace the Ow(x) of T2 with a read option o;,0. We follow a similar 
depiction in all the remaining diagrams of the paper. 


— LOWER: The step allows to remove read/write options as well as full option 
lists at any time. 


We revisit the examples from [3] to illustrate that loLRA forbids those out- 
comes. In following discussions, shaded portions of the diagram for each thread 
correspond to its option lists. We write Oy to represent a read option o with 
loc(o) = x and val(o) = v. 


Example 2. Recall the execution graph of MP from (see Figure 4). Since 
no step in loLRA can introduce a write option, we observe the following facts 
about the option lists Lo € Bo(T1) and Lo € Bo(T2) where Bo may lead to the 
annotated program state (a = 1 and b = 0) using a trace in which Lo and Lj 
are not discarded by a LOWER step: 


1. Lo contains Oy(x) - Oy(x) - Ow(y) as a sub-list to enable W(x,0), W(x, 1), and 
W(y,1) in Ty. 

2. For the reads R(y, 1) and R(x, 0) to happen the corresponding writes W(y, 1) 
and W(x,0) need to insert read options oy; and ox, at these locations (see 
READ step). 

3. Lọ contains Oy(y) followed by Oy(x) to enable future insertions of read options 
Oy 1 and 04,9 by the writes W(y, 1) and W(x,0) respectively (see condition 2 
of WRITE step). 


Starting in the state Bo (1(a) in Figure [4], one can reach state 3(a) through 
state 2(a) in two successive steps corresponding to execution of the first two 
writes, W(x, 0) and W(x, 1) of Tı, where the first write W(x, 0) replaces Oy(x) in 
the option list of Tə with a read option ox resulting in L’ = Lo[0u(x) + 0x0]. 
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Ti To Ti T2 Ti T2 
Wæ- [eh wnh 
vg da a E32 ..mm x w32-. - X 
AG, past ey NH 
(Oscillation 1) 1(a) 1(b) 
Tı T2 T3 Ti T2 T3 Ti T2 T3 
R(x, 1) af sy | Sy l 
w(x, 2) rf. "n Ws, 1) Ox)... HE pE W(x, 2) " Je] 
ME B M 
(Oscillation 2) 2(a) 2(b) 


Fig. 5. loLRA transitions for Oscillation 1 and Oscillation 2 (Example B]. 


In the next step (shown as 4(a)), we hope to perform the write W(y, 1) in T; and 
replace Oy(y) in To with the read option oy,;. However, the current write step 
requires that the suffix of L’ after Oy(y) (here, 0,9) be present as an option list 
of thread T; (due to condition 4 of the WRITE step). This is clearly not true and 
hence we can not continue with the current execution trace. To circumvent this 
blocking run the first write W(x,0) of T; might want to non-deterministically 
insert a read option Ox, at the specified location (see 2(b)) in its option list. 
However, due to the presence of an earlier 04(x) in the option lists of T, this is 
not allowed. Therefore, the loLRA semantics successfully forbids the annotated 
outcome of the message passing test. 


Example 3. Recall the execution graphs of (Oscillation 1) and (Oscillation 2) 
from (see Figure 5), where T2 oscillates between the observed values of x. 
Consider following two cases (and the corresponding execution graphs) to observe 
a contradiction for each possible trace of loLRA: 


— W(x, 1) executes before W(x,2): For (Oscillation 1) and (Oscillation 2) this 
is depicted as 1(a) and 2(a) of Figure [b] respectively. Note the presence of 
Oy(x) at the specified locations in the option lists of thread Tz to mark the 
end of new read options due to the future write W(x, 2). In the current state 
of (Oscillation 1), the write W(x, 1) of thread T2 is not allowed to put a read 
option in its own option list due to the presence of an earlier Oy(x) (see 
condition 1 of WRITE step). Similarly in the current state of (Oscillation 2), 
the write W(x,1) of thread T4 cannot place new read options in the list of 
thread Tz because Oy(x) appears between the new read options (see condition 
3 of WRITE step). 

— W(x,1) executes after W(x,2): For (Oscillation 1) and (Oscillation 2) this is 
depicted as 1(b) and 2(b) of Figure [5] respectively. Note the presence of 0,» 
(instead of Oj(x) in the previous case) at the specified location in the option 
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Ti To T3 Ti T2 T3 Ti T2 Ta 
? 

hb W(x,1)- W(x, 1)K 

R(y, 1) i (y) e i Oy). j 
f sue um wy) | 7) | 
W(x, 2) _=: SP Ow(x) ~ _ x W(x, 2) _ _ x 
>R(x,2) -< wy, 1) uo ,^ 1c) ^ ECCE osa f 

1^ rf of " of # 

R(x, 1) 5 ES ^ E B. 

(Oscillation 3) 3(a) 3(b) 


Fig. 6. The loLRA transitions for the program Oscillation 3 (Example fi}. 


lists of thread T2 to allow the read R(x,2) to read in future from the write 
W(x,2) of Tı. Again in the states corresponding to 1(b) and 2(b), due to 
conditions 1 and 3 of WRITE step, W(x,1) is not allowed to put new read 
options at the specified locations. 


Example 4. Recall the execution graph of (Oscillation 3) from [53] where T» oscil- 
lates between the observed values of x (see Figure|6). We consider the following 
two cases and the resulting execution graphs, based on the order of execution 
between the write events W(x, 1) and W(x, 2), to observe a contradiction in each 
trace of loLRA: 


— W(x, 1) executes before W(x, 2): This condition is depicted as 3(a). Note the 
presence of Oy(y) and Oy(x) at the specified location in the option list of T2 to 
mark the end of new read options due to the future writes W(y, 1) and W(x, 2) 
of T3 and T; respectively. Also note the presence of Oy(x) in the option lists 
of T3. We claim that this Oy(x) is needed as justification for the future write 
W(y, 1) of Ta (when the write W(y, 1) will be replacing the write option Oy(y) 
on Tz with the read option oy). To justify the claim, assume otherwise (i.e., 
Oy (x) is absent in the option list of T4), and we observe that W(y, 1) of Ta can 
not continue in any of the following possible cases: 

e W(x,2) has not occurred when W(y, 1) tries to execute: In this case Oy(x) is 
still present in the option list of Tə and hence is required at the specified 
location in the option list of T4 as a justification for the current write 
W(y, 1) (see condition 4 of the WRITE step). Therefore, the write W(y, 1) 
can not continue in this case. 

e W(x,2) has occurred when W(y, 1) tries to execute: In this case Oy(x) on 
Tz has been replaced with a o, and hence o,» is also expected in the 
option list of Ts (as justification for the current WRITE step W(y, 1)). 
However, the presence of o, in Tg can only be ensured (as insertion of 
new read option) by the corresponding write W(x,2). The write W(x, 2) 
can not add a 04,2 at the specified location due to the absence of Oy(x) 
at the same location to mark the end of newly added read options (see 
condition 2 of WRITE step). Hence, in this case again the write W(y, 1) 
can not continue. 

Assuming the presence of Oy(x) in the option list of T4 (as shown in 3(a)) it 
is easy to see that W(x, 1) of Ta can not put a read option in its own option 
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Fig. 7. The loLRA transitions for the program Oscillation 4 (Example [). 


list (see condition 1 of WRITE step) which is necessary as justification for the 
future write W(y, 1) of T3 (again using similar arguments as discussed above). 
Therefore, the current case is forbidden by the lossy loLRA semantics. 
— W(x, 1) executes after W(x, 2): This condition is depicted as 3(b), where the 
write W(x,2) of T; has replaced the write option Oy(x) in the option lists 
of Tz with a read option 04,5. Again, as discussed in the previous case (for 
justifying the future write W(y, 1) of T3), the write W(x, 2) of T; should also 
place a read option 0,2 in the option lists of T4 at the specified location. 
Now, as shown in 3(b), the write W(x, 1) of Tz can not put a read option in 
its own option list (due to the presence of an earlier o4,2) which is necessary 
s justification for the future write W(y, 1) of T3. Thus, the current case is 
also forbidden by the lossy loLRA semantics. 


w 


In the discussions so far (particularly related to cases 1(a), 2(a), and 3(a) of 
the previous examples), we observed that marking the end of newly added read 
options (using a pre-existing write option) is helpful in forbidding oscillations. In 
all of these cases it is easy to see (using exactly similar arguments) that we can 
also forbid these oscillatory behaviors by requiring (in conditions 1 and 2 of the 
WRITE step) that the beginning of newly added read options be marked using a 
pre-existing write option. Next example illustrates the distinctive advantage of 
marking the end over marking the beginning. 


Example 5. Consider execution graph (Osc 4) corresponding to the annotated 
outcome of (Oscillation 4) shown in Figure[7] The ER N ore eE 
forbids this execution graph since we have (1) G.hb|ioc V G.rf from W(x, 1) to 
the third read R(x, 2) of T2; (2) G.hb from the third read R(x, 2) of T2 to the last 
read R(x, 1) of T2; and (3) rf from W(x, 1) to the last read R(x, 1) of T2. 

Consider the following two possibilities (4(b) and 4(a) of Figure |7) corre- 
sponding to this outcome where: (1) W(x, 1) executes after W(x, 2); and (2) W(x, 1) 
executes before W(x, 2). 
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Assuming (1) and using arguments similar to Example |4| we land in config- 
uration 4(b) which is not allowed by the lossy loLRA semantics. However, note 
that assuming (2) we get a contradiction only because Oy(x) is present at the 
specified location in 4(a) to mark the end of new read options in the option list 
of T2 (by the write W(x, 2) of thread Ti). Instead, if we choose to mark the be- 
ginning (and not the end) of new read options in the option list of Tz we result 
in the configuration of 4(c) resulting in the absence of any pre-existing 0y(x) 
at the end of the new entries. In this case, we observe that there is a trace of 
lossy loLRA (for the annotated outcome of (Oscillation 4)) in which W(x, 1) and 
W(y, 1) of Ts appears before W(x, 2) of Ty. 


Next, we show that for a given program Pr, PrxloLRA admits the required 
conditions of the WSTS framework that ensure decidability of the induced cov- 
erability problem (see, e.g., [9][15]). In particular, the compatibility condition 
between the well-quasi-ordering on states and the transitions is trivial since we 
explicitly include the (LOWER) step in loLRA. 


Lemma 1. Given a program Pr, the LTS PrmloLRA equipped with the well- 
quasi-ordering E (lifted to states of PraloLRA by defining (p, B) C (p ,B') iff 
p — yp and BE B') is a WSTS that admits effective initialization and effective 
pred-basis. 


As a corollary, we obtain that state reachability under loLRA is decidable. We 
refer the reader to where we give more details and proofs (which generally 
follow those in [22]. 


5  Equivalence of the Memory Systems for LRA 


In this section we establish the equivalence between loLRA and opLRA by 
demonstrating a simulation between these systems. The states of loLRA and 
opLRA are related to each other using write lists, which match read options in 
loLRA’s potentials with concrete write event in opLRA’s execution graphs. 


Definition 6. A write list is a sequence of write events and write options. Let 
G be an execution graph, L an option list, and tidgg : W — Tid. A write list 
W is a (G, L, tidgw) -write-list if |L| = |W| and the following hold for every 
1<k< |W: 


— If L(k) is a write option, then W(k) = L(k). 
— If L(k) = (7, z, v, aw); then W (k) € G.W, tid(W(k)) =7, loc(W(k)) = 2, 
valy(W(k)) =v, and tidguy(W (k)) = T'RMW- 


In addition to the above, we require that and 


are maintained by any extension of the execution graph G with a 
sequence of reads and writes of thread 7 that are obtained by following the write 
list W. This is formalized in the following notion of (G, 7)-consistency of a write 
list W. 


250 Abhishek Kr Singh and Ori Lahav 


T T2 n T T2 T T2 
w 2 9 = KZ 
ANA T d 
W(k) W() ..... amm Wk) W(k) 
"E J) E E 
mi Q* wt), F 
L(k) L(k) L(k) 
x = loc(W(k)) 
a) b) [C2\a) 
n T T2 n T T2 n T T2 
hb? 
s" 485 E 2 E 
rf wW ^(4 
WO) ul way E W(k) ze WO W(k) 
E PU 3, a m ml 
ole) É mi £7 LG), Y 
L(k) L(k) L(k) 


x = loc(W(k)) 


loc(w) = loc(W(k)) 


loc(w) = loc(W(k)) 


C2[b w z W(k) w z W(k) 
i (Ca) (ésto) 
Tl T T2 Tl T T2 
Wii > 
x% A 
WO. a WER) wo "apy WO 
wk 10) E 
L(k) L(k) 


loc(W(j)) = loc(W(k)) loc(W(j)) = loc(W(k)) 


W(j) Æ W(k) W (i) # W(k) 
[sr Ci) 


Fig. 8. Illustration of conditions in Definition [7| for the (G, 7)-consistency of W. Each 
condition is split into two cases (e.g., [C1]is summarized using [C1]a) or b)). 


Definition 7. A write list W is (G,7)-consistent if for every 1 € k < |W| with 
W (k) € E: 


C1 W(k) € dom(G.hb|ioc ; [W] ; G.hb’ ; [E U {W (3) | 1 € j < kY). 
C2 If W(i) = Oy(loc(W(k))) for some i < k, 
then W(k) ¢ dom(G.nb' ; [E7 U {W (j) | 1 € j « i)]). 
C3 W(k) € dom((G.hb|;oc \ G.r£) ; [R] ; G.hb’ ; [E7 U {W(j) | 1 € j < k)]). 
C4 If 1oc(W(7j)) = 1oc(W(Kk)) and W(k) Z W (j) for some j < k, 
then W(k) ¢ dom(G.hb’ ; [E7 U (W(i) | 1 <i « j)]). 


Intuitively, for any future extension of execution graph with a sequence of 
events on 7, conditions |C1]and [C2]help in maintaining [weak-coherence]while 
and ensure that is preserved. To assist readers, these 
conditions are depicted using diagrams in Figure [8] where the shaded area of T 
represents a sequence of future events. 

The simulation relation Y is now defined as follows. 
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Definition 8. A state B € loLRA.Q matches an execution graph G, denoted 
by B v G, if there exists a function tidgw : W — Tid, such that: (1) for every 
T € Tid and L € B(T), there exists a (G, 7)-consistent (G, L, tidgw;)-write-list, 
and (2) for every (w,e) € G.rf ; [RMW], we have tid(e) = tidgmy(w). 


Based on the simulation relation, we establish the equivalence of loLRA and 
opLRA. The proof, given in [22]. shows that Y constitutes a forward simulation 
from loLRA to opLRA, and Y ^! constitutes a backward simulation from opLRA 
to loLRA. 


Theorem 1. The traces of loLRA and the traces of opLRA coincide. 


6 Conclusion, Related and Future Work 


We established the decidability of state reachability for finite-state programs 
under LRA, a memory model that lies strictly between WRA and RA. For that 
matter, we adapted the potential-based semantics of WRA from to LRA, and 
showed that it meets the requirements for decidability of the WSTS framework. 

In addition to the closely related work discussed in the introduction to this 
paper, the paper studies the problem of verifying whether a given memory 
system provides causal consistency, which is a different verification problem than 
the one discussed in the current paper. The CC model in (when restricted 
to single instruction transactions) is equivalent to (the RMW-free fragment of) 
WRA, whereas CCv from is equivalent to SRA. 

Another line of related work concerns parametrized programs, where one has 
an unknown number of threads but all of them run the same code. This arises 
a decidable verification problem under SC and TSO [5], but decidability of this 
problem is still unknown for WRA, SRA, and LRA. For the RMW-free fragment 
this problem is PSPACE for TSO [8] as well as for RA [19] (the latter result 
also allows a fixed number of distinguished threads running loop-free programs, 
possibly including RMWs). 

An interesting direction for future work is to try to further close the gap 
between LRA and RA by introducing a restricted form of RA’s modification 
order. A related problem that is still open (to the best of our knowledge) is 
whether the fragment of RA without RMWs induces a decidable verification 
problem. In addition, other models with undecidable reachability problems (such 
as the promising semantics ie] and the full POWER model al) may be bounded 
from below by decidable models. 
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Abstract. Decision diagrams (DDs) are an important data structure in 
computer science with applications ranging from circuit design and verifi- 
cation to machine learning. Most prominently, binary DDs are commonly 
used to succinctly represent Boolean functions. Due to the practical im- 
portance of DDs, there is an ongoing quest for high-performance software 
libraries supporting the construction and manipulation of DDs. With 
OxiDD, we present a new framework for DDs that focuses on safety, 
concurrency, and modularity. Following a highly modular design we im- 
plement OxiDD in Rust, which facilitates the integration of various kinds 
of DDs such as MTBDDs, ZBDDs, and TDDs, all within safe code also in 
a concurrent setting. Already in its initial release, OxiDD does not com- 
promise performance, which we show to be on par with or even better 
than established highly optimized DD libraries. 


1 Introduction 


Boolean functions play a central role in the design and analysis of computing 
systems. They frequently appear in different representations through logics, cir- 
cuits, machine learning classifiers, or binary decision diagrams (BDDs) [1,12]. In 
particular, BDD representations are appealing as they are strongly normalizing 
and provide efficient operations such as applying Boolean operators, finding and 
counting satisfying assignments, or checking equivalence. Applications of BDDs 
encompass a wide range, including symbolic model checking and logic synthe- 
sis [13,15,26,20,16]. Much work on BDD research and implementations has been 
conducted during the first two decades after Bryant’s seminal work [12]. This 
lead to various other types of decision diagrams (DDs) that extend the core 
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principles beyond Boolean functions or improving the efficiency for specific ap- 
plications. Most prominently, multi-terminal BDDs (MTBDDs) [17,4] enable 
pseudo-Boolean function representations, zero-suppressed BDDs (ZBDDs) [32] 
usually provide more efficient representations for sparse sets than BDDs, or list 
DDs (LDDs) [9] efficiently encode transition vectors. 

The most frequently used BDD libraries that are still considered state-of- 
the-art are BuDDy [27] and CUDD [40]. They originate from the 90s and do 
not fully exploit recent scientific advancements and modern design opportuni- 
ties. Therefore, DDs and in particular BDDs gain more and more attention 
again, incorporating insights from satisfiability checking [8] but also providing 
advances in distributed and parallel computation and feature selection algo- 
rithms [18,39,7,23]. Sylvan [18] is a more recent BDD library that focuses on 
multithreaded operators, which however is also entirely written in C. Hence, it 
requires all memory management to be done manually, in particular challeng- 
ing in the parallel setting. Manual resource management is one of the common 
sources for bugs that lead to undefined behavior (UB), a situation where the 
programming language does not assign any semantics to the code. Consequences 
of UB are crashing programs or wrong results, the latter particularly being intol- 
erable in verification tools or other critical applications where BDD libraries are 
commonly employed. Further, while existing libraries provide support for differ- 
ent kinds of BDDs such as MTBDDs or ZBDDs, the inherent lack of genericity 
in C required specifically tailored implementations. More elaborate extensions, 
e.g., towards ternary decision diagrams (TDDs) [38], would also require major 
internal changes in the library implementations. 


In this paper, we develop a new DD framework, called OziDD, to provide the 
basis for future developments in DD research and technology. As such, OxiDD 
focuses on easing the implementation of new DD types, providing reusable com- 
ponents commonly used in different kinds of DDs, and relying on modern tech- 
nology. This leads to the following four major development goals for OxiDD: 
safety, concurrency, modularity, and performance. 

By safety, we mean the absence of undefined behavior. Concurrency refers 
to thread-safety when used from multithreaded applications on the one hand. 
On the other hand, the framework itself should leverage multicore architectures 
for performance. Modularity should already be fulfilled by the nature of a frame- 
work, clearly separating concerns and enhancing extensibility. Here, clear inter- 
faces should separate algorithms from data structures and allow to easily replace 
implementations of a component by another. 

We tackle all the four development goals by implementing OxiDD in Rust, 
which is considered to be a safe programming language. Rust achieves safety 
via a rich type system but does not compromise performance: usually, Rust 
programs do not show any runtime overhead compared to C/C++. Furthermore, 
Rust allows us to define clear and generic interfaces, as well as efficient implemen- 
tations of data structures. Also here, genericity does not come with any runtime 
overhead, as the compiler generates specialized code at compile time. For high 
performance, we opt into Unsafe Rust, a language syntactically separated from 
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© 3 e > > C 
SS S» SISE © & E SU CR 
Library Version Last Release SP D Q^ A» A SQV SP QM qr 
Adiar [39] 1.2.2 2022/11 CH V " " 
Biddy [31] 2.2.1 2022/12 C lv v Y " 
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BuDDy [27] 2.4 2004 C lv Vv 
CAL [33] 2.1.1 2022/01 C " "E v 
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OxiDD Rust V V V Vv VCI GE X 
Fig. 1: Popular DD libraries 
Safe Rust using the unsafe keyword. Unsafe Rust enables a few additional oper- 


ations whose safety cannot be checked by the compiler. Connecting Unsafe and 
Safe Rust requires safe abstractions upholding the central soundness property 
of Safe Rust: “No matter what, Safe Rust can't cause Undefined Behavior.” [36] 
'The art is to keep the portion of Unsafe Code as small as possible without vi- 
olating the soundness property. One instance where we need Unsafe Rust is to 
support reordering of variables without node-wise locking. In this case, design- 
ing safe abstractions has been challenging. In the end, however, we gain both 
performance and implementations of all DD operations entirely in Safe Rust. 


Contributions and Outline. We report on generic implementations of BDDs, 
MTBDDs, ZBDDs, and TDDs in OxiDD, focusing on implementation design and 
evaluating OxiDD's performance. Section 2 gives a detailed description of these 
DD types and enhancements. For working with these DDs from Rust, we provide 
high-level interfaces similar to those of existing libraries that—in contrast to 
those—cannot cause UB, and also provide C and C+ bindings. Section 3 goes 
into more detail about the framework's architecture and implementation details. 
We also point out some insights from tuning the data structures for performance. 
For this, we design safe abstractions, a highly non-trivial process we report on in 
Section 3.3. In Section 4, we finally evaluate the performance of OxiDD's BDD 
implementation. Our results show that OxiDD is on par with existing libraries, 
and even outperforms them in certain scenarios. This lets us conclude that in 
OxiDD, safety and modularity do not come at the expense of performance. 


Further Related Work. For an overview comparing the features of popular 
and recently maintained BDD libraries, see Fig. 1. Here, BCDD refers to BDDs 
with complemented edges. The standard libraries BuDDy, CUDD, and Sylvan 
are widely used in several communities due to their manifold BDD manipulation 
operators and rich functionalities. Besides those, there are various other libraries 
that mostly provide specialized implementations. Biddy [31] mainly started as 
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f g 
— then 
To xo 
a --- else 
Tı ' MER EEG MEME else complement 
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(a) BDD (b) BCDD 


Fig. 2: Example decision diagrams for Boolean functions f,g: B? — B where 
f (xo, 21,22) = —(z1 V £2) and g(zo, 21,22) = zo € 21 € T2 


an educational implementation but nowadays also supports a wide range of dif- 
ferent BDD types such as tagged BDDs [19,14]. Java implementations such as 
JDD [41], BeeDeeDee [28], or PJBDD [7] provide better safety properties than 
C implementations, but usually cannot compete with performance. In case DDs 
grow beyond the size of the entire main memory, it becomes especially important 
to reduce the amount of random disk accesses. This is what the external memory 
libraries Adiar [39] and CAL [37] focus on. Development of CAL ceased back in 
1996, but it was recently brought back to life in context of research on Adiar. 
Biodivine/LibBDD [6] is a notable BDD implementation in Rust and to the best 
of our knowledge the only Rust library besides OxiDD that supports existential 
and universal quantification. We are not aware of any DD implementation in the 
spirit of a modular framework that emphasizes safety as much as OxiDD does, 
while being concurrent and delivering high performance. 


2 Background: Decision Diagrams and Rust 


We recall kinds of DDs relevant for this paper, explain the role of variable orders 
and variable reordering, as well as preliminaries on safe abstractions in Rust. 


2.1 Kinds of Decision Diagrams 


Decision trees (D'Ts) are tree-like structures that represent functions through 
variable-labeled decision nodes and terminal nodes with function outcomes. Each 
path from the root to a terminal stands for assigning variables with values with 
the function outcome of the terminal. Decision diagrams (DDs) are rooted di- 
rected acyclic graphs that arise from DTs by merging isomorphic subtrees. We 
assume DDs to be ordered, i.e., variable occurrences follow a given total order 
on all paths in the DD. The order restriction may also be formulated by assign- 
ing each node a level, which we number from top to bottom. Then, a variable 
order c is a bijection between the levels 0,..., k — 1 and the k input variables. 
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1 fn apply_and(n: &Node, m: &Node) -> &Node { 

2 // “terminal cases" 

3 if n == m { return n; } if n == | || m == L { return L; } 

4 if n == T ( return m; } if m == T { return n; } 

5 if n.level < m.level { // n is above m 

6 level = n.level; t = apply and(n.t, m); e = apply and(n.e, m); 
7 } else if n.level == m.level 1 

8 level = n.level; t = apply and(n.t, m.t); e = apply and(n.e, m.e); 
9 ) else { // n is below m 

10 level = m.level; t = apply and(n, m.t); | e = apply .and(n, m.e); 
11 Y 

12 return get or make node(level, t, e); 

13 Y 


Fig. 3: Apply algorithm for conjunctions (pseudocode) 


Terminals are considered to be on a distinguished level oo at the bottom. Then, 
every node at level i can only have successor nodes at levels greater than i. 


Binary DDs (BDDs). The mes prominent kind of DDs are BDDs, used to 
represent Boolean functions f: B* — B over B = (1, T). They comprise termi- 
nal nodes T and as well as inner nodes n with Nm "then" and “else” edges 
pointing to nodes n, and ne, respectively. By n, m, ..., we usually denote nodes 
and by zo,z1,... variables. BDDs are usually considered to be reduced, i.e., for 
any inner nodes n,m (1) n; A ne and (2) if level(n) = level(m), n, = mi, 
and n, — m, then n — m. One major advantage of such BDDs is that they 
are strongly normalizing, i.e., they agree up to isomorphism for any Boolean 
function [21]. Shared BDDs associate function names with nodes, allowing for 
multiple functions to be represented in a single BDD structure. See Fig. 2a for 
an example of a (shared reduced) BDD with two functions f and g. 

The semantics [n] of a BDD node n is recursively defined as a Boolean 
function. If n is a terminal, [n] is a constant function, mapping always to true 
if n — T or false if n = L, respectively. If n is an inner node at level i, then [n] 
is (toi) ^ [mi]) V zea) ^ [ne]), the Shannon decomposition of [n] w.r.t. tot). 

A BDD is typically created by successively applying Boolean connectives to 
already existing BDDs. As an example, the apply algorithm for conjunctions 
works as shown in Fig. 3. Here, it is assumed that the get. or. make. node func- 
tion at the bottom also maintains reducedness, typically implemented using a 
hash table called unique table [11]. Note that the runtime of a naive apply. and 
implementation is exponential in the number of variables of the functions rep- 
resented by n and m. By applying memoization, the runtime can be reduced to 
O(|n||m|), where |-| denotes the count of descendant nodes. Memoization is typ- 
ically implemented using a fixed-size cache called apply cache or computed table. 
'The design of combining unique table and computed table towards an efficient 
BDD implementation was originally proposed by Brace et al. [11]. Besides ap- 
ply algorithms based on recursion, there are also breadth-first apply algorithms 
implemented, e.g., in the BDD libraries CAL and Adiar [37,39]. 
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Complement Edges. To reduce the number of nodes in a BDD and to support 
negation in shared BDDs in Ó(1), Brace et al. proposed complement edges as 
a new edge type in DDs [11]. We abbreviate BDDs that contain complement 
edges by BCDD. The semantics of a complemented edge pointing to a node 
n is just ^[n]. To recover a strong normal form, we remove the | terminal 
node and impose the restriction that a “then” edge is never complemented. The 
latter forms, besides the two standard conditions on reduced BDDs, the third 
condition rendering BCDDs reduced. To ensure this condition, any node n whose 
“then” edge is complemented can be replaced by a node n’ whose “then” edge 
is regular. The “else” edge of n' is the complement of the “else” edge of n such 
that [n'] = ^[n]. This means that all nodes that previously referred to n with 
a regular edge now have to use a complemented edge to n' and vice versa. This 
is the reason why—in contrast to the apply. and in Fig. 3—we formulate all 
algorithms based on edges (i.e., possibly tagged node references) rather than 
simple node references. Since functions f and =f are represented by a single 
node, BCDDs may halve the number of nodes compared to BDDs. 


Zero-Suppressed BDDs. A function f: B^ — IB may also be interpreted as a 
characteristic function of a set S = (v € B* | f(v) = 1) C B^. We can even view 
a Boolean vector as a subset of some “universe” U, so we also have S C P(U). For 
example, let U = (a,b). The function a represents the set of all sets containing a, 
i.e., {{a}, (a, b}}. Conversely, the set {{a}} is represented by the function a^ ^b. 
'This means that we can use BDDs to represent sets of Boolean vectors or sets of 
finite sets. If these sets are sparse, however, the corresponding BDD can be very 
large. Zero-suppressed BDDs (ZBDDs, ZDDs, or ZSDDs), which were introduced 
by Minato [32], are more apt for this use case. Like BDDs, ZBDDs have inner 
nodes with two outgoing edges we call hi and 1o here. The terminal nodes are Ø 
(“empty”) and {Ø} (“base”). Their semantics is just [9] = 2 and [(21] = {Ø}. 
For an inner node n at level i, we have [n] = [nio] U {£o Ua | o € [nsi] }. To 
ensure reduced ZBDDs, a different first condition than BDDs is imposed: While 
for all nodes n in BDDs its children should represent different functions, i.e., 
(1) ne Z ne, in ZBDDs we require that the node itself and the 1o-node should 
represent different functions, i.e., (1) ny # Ø. 


Multi-Terminal BDDs (MTBDDs). While BDDs only contain two termi- 
nal nodes L and T, MTBDDs allow for arbitrary finitely many terminals [17]. 
Hence, MTBDDs can represent functions B^ — S, where S is an arbitrary set. 
A prominent application for MTBDDs is in symbolic probabilistic model check- 
ing [5] where S — [0,1]. To allow such infinite sets, terminal nodes are usually 


created on demand, ensuring finiteness due to finitely many inner nodes of the 
MTBDD. MTBDDs are also known as algebraic decision diagrams (ADDs) [4]. 


Multivalued DDs (MDDs). Representing functions Do x --- x Dk-1 > S 
imposes implementation challenges. For finite domains D; we could rely on a 
binary encoding and resort to (MT)BDDs, then also called finite domain decision 
diagram (FDD). However, the properties of such FDDs heavily depend on the 
chosen bit-blasting encoding of the domains. As an alternative, MDDs directly 
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encode multiple values as multiple outgoing edges [25]. Just like in MTBDDs, 
there is one terminal node per (used) value of S. Ternary decision diagrams 
(TDDs) may be viewed as one instance of MDDs, where Dp = +- = Dp-1 
S={1,?, T}. That is, TDDs represent functions of three-valued logic [38]. 


2.2 Reordering 


The size of a DD—no matter of which kind—may heavily depend on its variable 
order. There are functions B^ — B where different variable orders can lead 
to node counts in the class of O(2*) but also O(k). Determining whether a 
variable order is suboptimal itself is an NP-complete problem [10], but there 
are heuristics to derive a good variable order from a (propositional) formula 
describing the function [34,2]. However, there are applications where such a 
formula is not available in advance. Furthermore, building the BDD for some 
intermediate result may require a different variable order than building the final 
BDD. In such cases, it is possible to reorder the existing DD, e.g., using Rudell's 
sifting algorithm [35]. The core of this algorithm is to pick a variable, try out all 
positions for it, and then move it to the best position. This procedure is repeated 
until no improvement is made. 

'There are various other reordering algorithms, but moving a variable to an- 
other position usually boils down to swapping all nodes of adjacent levels. Key 
characteristics of variable swap are that the semantics of nodes is preserved, and 
the operation can be performed in-place, i.e., locally. This is crucial, because 
nodes at levels į and 4 4- 1 may be referenced by many nodes at higher levels. To 
explain the swap operator, we restrict ourselves to BDDs for simplicity. Let n 
be a node initially at level 4 where at least one of n; and ne is initially at level 
i + 1. The semantics of n then depends on both, the upper variable z = ø(i) 
and the lower variable y = c (i + 1). Hence, n is essential at level i, redirecting 
the edge to n, towards a node for [n][y = T] (i.e., [n] with y set to true) and 
ne towards a node for [n]|y := L]. If new children already existed and the old 
children have no incoming edges anymore, the node count decreases. Otherwise, 
it is well possible that the node count stays the same or even increases. 


2.3 The Power of Safe Abstractions 


Rust’s central soundness property, “No matter what, Safe Rust can’t cause Unde- 
fined Behavior" [36], is very powerful. In general, while software components may 
seem sound in isolation, their composition can still cause UB. This is because 
computer-checkable interface specifications, e.g., function types, are usually too 
limited to capture all conditions required to prevent UB. 

For Safe Rust, the situation regarding UB—notably including data races—is 
different. Due to the soundness property, we can be sure that any composition 
of components either does not cause UB or is forbidden by the type system. 
While this translates to peace of mind for the user, it also requires a soundness 
argument for every piece of unsafe Rust code. For instance, the following unsafe 
code is unsound: 
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fn bad_deref(ptr: *const u32) -> u32 { unsafe { *ptr } } 

Inside the unsafe block, we dereference a raw pointer, which is an unsafe opera- 
tion. The unsafety arises from the fact that dereferencing a dangling pointer has 
no defined semantics. Now, we would need to argue why ptr cannot be dangling. 
But, any pointer can be passed to bad_deref, so the code is unsound. 

To remedy this issue, the function must be marked unsafe as well, so that it 
cannot be called from Safe Rust. Note that this now requires the use of unsafe by 
the caller. To prevent the entire code base from becoming infected with unsafe, 
a safe abstraction is required. For instance, the Box type in Rust’s standard 
library encapsulates a raw pointer and maintains the safety invariant that this 
pointer is always safe to dereference. As the pointer itself is inaccessible from 
the outside, this invariant cannot be violated and Box can thus provide a safe 
method for dereferencing it. The safety of this method is established entirely by 
local reasoning on the Box type and its safety invariant. 


3 Architecture and Implementation 


OxiDD’s architecture is highly modular. In Rust, crates serve as counterparts to 
packages in languages such as Python, OCaml, or Haskell. OxiDD’s implementa- 
tion is split into multiple crates, to encapsulate functionality and expose a public 
versioned API. Fig. 4 shows how OxiDD is decomposed into separate crates and 
their dependencies on each other. Each crate has its own well-defined purpose. 
The architecture is centered around the core crate that mainly consists of trait 


rules-bdd 


rules-mtbdd 


manager-index 


reorder rules-tdd 


cache 


rules-zbdd 


Fig. 4: OxiDD's architecture: dependency graph of the main crates. 


definitions which formalize the key concepts of DDs. Traits are Rust's equivalent 
to interfaces or abstract classes in object-oriented programming. By using traits 
for abstracting from concrete implementations of key concepts, OxiDD achieves 
its high degree of modularity. Notably, there are no dependencies between al- 
gorithms and concrete implementations of data structures, all algorithms and 
data structures are written in a generic way. To provide end users with default 
implementations, e.g., towards the use of OxiDD as a BDD library, there is the 
oxidd crate, which assembles standards that have been shown useful in practice. 
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3.1 The OxiDD Framework 


Instead of being yet another DD library, its modular architecture is what makes 
OxiDD a framework (Fig. 4). Different implementations can be composed and 
swapped out for alternatives. All functionality has clear interfaces and can be 
separated into individually maintained and versioned crates. Third-party con- 
tributors can easily develop crates for novel kinds of DDs, core data structures, 
or reordering heuristics. Facilitated by OxiDD’s abstractions, those crates will 
work seamlessly together, thereby making it ideal for future research on DDs. In 
this section, we provide further details on key concepts of this framework. 


Manager. The manager is the data structure that stores all nodes of a DD and 
ensures their uniqueness via a unique table [11]. It also provides functionality for 
delayed garbage collection (GC), where the removal of nodes is delayed as far as 
possible. Early removal of nodes would lower performance, if nodes need to be 
recreated. An implementation of the manager trait also defines an edge type. An 
edge is a reference to a node, and may additionally have a tag. Tags are used, 
e.g., to mark edges as complemented in BCDDs. An inner node consists of its 
outgoing edges and optionally a level number. The latter is required for most 
kinds of DDs but can be omitted, e.g., in quasi-reduced BDDs. 

OxiDD allows for different manager implementations. The manager-index 
crate contains a manager implementation that uses 32-bit unsigned integers to 
represent edges. These 32-bit are split into an index referencing a node and a 
tag. If 2?? nodes are too limiting for a use case, it is well possible to implement 
a different manager, e.g., one where nodes are referred to by pointers. In Fig. 4, 
this is indicated by the dashed manager-pointer box. 


Cache. Typically, each manager has an associated apply cache, which is re- 
quired by our recursive apply implementations for DD manipulation. Notably, 
the architecture of OxiDD is also open to other implementations, e.g., for a 
breadth-first apply algorithm (cf. [37,39]). The cache crate provides an apply 
cache as a fixed-size hash table. As with managers, alternative implementations 
of the apply cache are possible, and they can be freely composed with other 
implementations of the core infrastructure, e.g., managers. 


Functions. Recall that shared DDs represent functions of various types (cf. 
Section 2), represented in a single data structure. In the graphical DD repre- 
sentation (cf. Fig. 2), functions correspond to the boxed fs and gs. From an 
implementation perspective, functions are an edge paired with a reference to the 
manager storing the respective node. For end users, functions provide a conve- 
nient interface for creating and manipulating DDs. 


Support for Various Kinds of DDs. The apply algorithms for the different 
DD kinds are implemented in the crates starting with rules. Besides the reduc- 
tion rules, these crates also define terminal node and edge-tag types. Depending 
only on the abstractions provided by the core crate, other kinds of DDs can 
easily be implemented. Notably, implementations are also shielded from UB as 
they can be implemented entirely in safe code. 
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let manager ref = oxidd::bdd::new manager(2048, 1024, 8); 

let (x1, x2, x3) = manager ref.with manager exclusive(|manager| {( 
BDDFunction: :new_var (manager) .unwrap(), 
BDDFunction: :new_var (manager) .unwrap(), 
BDDFunction: :new_var (manager) .unwrap(), 

dH); 

let res = x1.and(&x2)?.or(&x3)?; 

println!("()", res.satisfiable()) ; 


o 4 O0 OG FR WN 


Fig. 5: Constructing a BDD for (xı ^ z2) V z3 with OxiDD's API. 


Reordering. OxiDD provides the fundamental mechanism of swapping levels in 
DDs for variable reordering (cf. Section 2). Currently, the reorder crate imple- 
ments functionality to establish a given variable order, e.g., harmonize variable 
orders of different DDs or impose a static variable order heuristic. Support for 
dynamic reordering, e.g., via sifting [35], is planned for OxiDD's next release. 


End User Ergonomics. While achieving a high degree of modularity through 
abstraction, this does not come at the expense of developer ergonomics for end 
users. Fig. 5 shows an example for constructing a manager, creating three vari- 
ables, building the expression (zı ^ £2) V 23, and then checking satisfiability. 
Here, 2048 and 1028 are the capacities of the manager for nodes and the apply 
cache, respectively, and 8 is the number of threads to use (see Line 1). 

The method with, manager. exclusive is used to obtain exclusive access 
to the manager, required for creating variables. As existing libraries, OxiDD 
offers functions for applying operators (Line 7) or checking satisfiability (Line 8). 
Note that the interfaces provided by OxiDD shield from UB, whether caused by 
memory mismanagement or data races. Therefore, Fig. 5 does not contain a 
single line of unsafe code. The question marks ? are part of Rust’s mechanism 
for handling errors, which may happen, e.g., when running out of memory. 


3.2 Design Choices and Defaults 


Implementing OxiDD, we also focused on providing a good set of default imple- 
mentations, selected and tuned for performance. 


Node Store. The manager-index implements a store for inner nodes as an 
array, consisting of an initialized part followed by an uninitialized part. Each 
element of this array may either be a node along with a reference counter, a free 
slot with a reference to the next free slot, or uninitialized (see Fig. 6). 

When creating a new node, we first check if the linked list of free slots con- 
tains an element. If yes, this element is removed from the list and the node is 
stored there. Otherwise, the first uninitialized slot is used. Should there be no 
uninitialized slot in the array, then we return an out-of-memory error. When 
deleting a node, we prepend the node's slot to the list of free slots. 
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In a concurrent setting, both the first-uninitialized index and the free slot 
list head are shared state requiring synchronization. To prevent contention, every 
worker thread gets its own first-uninitialized index and free slot list. Instead of 
incrementing the shared first-uninitialized index by 1, the worker pre-allocates 
the slots until the next multiple of 216. The free slot list is then split into multiple 
lists of (approximately) 2!9 elements. The shared state maintains an array of 
these lists, while the workers have just one of these lists. If GC reaches 218 nodes 
for a worker, the local list is moved to the shared state. The large lists avoid 
frequent synchronization with the shared state and thus contention. 

Terminal nodes are managed independently of inner nodes. To distinguish 
between inner and terminal nodes, we split the 32-bit “address space" into two 
parts. The first N node IDs are used for terminal nodes, the remaining ones for 
inner nodes. The actual array index is the node ID minus N. For example, we set 
N — 2 in case of BDDs, where ID 0 is used for L, and ID 1 for T. Determining 
the value of a terminal node does not require any memory operation here. For 
MTBDDs, however, we have to store terminal nodes in a separate array, similar 
to the inner node store described above. 


Reference Counting. For GC, we use reference-counting instead of a mark- 
and-sweep method. One reason for this design decision is in level-local GC 
used for reordering. Iterating through the entire DD for mark-and-sweep GC 
is very expensive. It would be possible to only materialize reference counters 
during reordering and use mark-and-sweep GC otherwise (implemented, e.g., in 
BuDDy [27]). However, this does not resolve the following issue: GC must not 
remove any objects that are referenced by local variables. In languages like C, 
C+, and Rust, we cannot simply inspect the program stack. BuDDy resolves this 
issue using a second stack to register all locally referenced objects there. The 
problem is that accidentally forgetting the registration may lead to use-after- 
free bugs and ultimately UB. This would imply that apply algorithms need to 
be written in Unsafe Rust, which is undesirable. Some solutions to this problem 
have been discussed [22], but have no advantage over plain reference counting in 
case of DDs. Our preliminary benchmarks indicate that the amount of runtime 
spent on reference counting is in the order of 596. Given that mark-and-sweep 
GC would probably not be zero cost either, this seems acceptable. 


Unique Table. The unique table is split into multiple hash tables, one per 
level. This split is useful for reordering, where we need to iterate over all nodes 


free node free node uninitialized 
neat: 44 then: 45 neat: 0 then: 0 
else: 7 else: 1 
level: 12 level: 10 
rc: 1 rc: 3 
42 43 44 45 46 
Fig. 6: Node store array (binary nodes with level). 


266 N. Husung et al. 


on a level. Since we need to grow these tables on demand, we protect each table 
with a lock. The hash tables in use are designed with cache locality in mind. In 
particular, we use linear probing to resolve hash collisions. For space efficiency, 
the tables only contain IDs of the respective nodes, and not the nodes’ outgoing 
edges. To improve performance when resizing the hash table, which normally 
requires rehashing all nodes, we store the hash next to the node ID. Thus, we 
can avoid rehashing any nodes. We further truncate the hash to 31 bits, so we 
can use the same 32-bit integer to mark the bucket as empty or as tombstone. 


Apply Cache. For the apply cache, we use a fixed-size hash table. Each entry 
consists of the operator ID, a fixed-size array of operands, and the result of the 
operation. To synchronize accesses on the table, we use a spinlock per bucket. 
On usual lookups, we do not wait in case another thread has the lock, we rather 
recompute the entry. When inserting a new entry, we always replace a previously 
present entry in the bucket. We also experimented with bucket sizes larger than 
one entry and replacement policies such as first in, first out (FIFO), and least 
frequently used (LFU), but these turned out to be slower than the direct-mapped 
apply cache. One reason might be that in our benchmarks, we generally observed 
rather few cache hits (in the order of 20-30 %). Larger bucket sizes would require 
checking more entries before concluding that an entry is not contained in the 
cache. In addition, FIFO and LFU do not account for the different costs of 
operations. Ideally, the apply cache would merely keep those entries that take 
more time to recompute and are also used frequently. We plan to investigate 
such a strategy in more detail in future work. Notably, such experiments are 
facilitated by the modular architecture of OxiDD. 

A particular important optimization is to elide reference counter updates 
when inserting or removing entries from the apply cache. This is due to referenced 
nodes rarely being in the CPU cache. Eliding reference counter updating implies 
that we must ensure that no nodes are deleted while referenced from the apply 
cache. Nodes can only be deleted during GC and reordering. Since a GC may 
run in background, we lock and empty all buckets of the apply cache prior to 
the GC. Only after the GC, we unlock the buckets again. 


Concurrent Apply Algorithms. OxiDD has recursive apply algorithms, both 
in a single-threaded and a concurrent version. The concurrent version uses task- 
based parallelism with work-stealing, similarly to Sylvan [18]. The idea is to 
execute the recursive calls (cf. Fig. 3) concurrently. For the implementation, we 
use the rayon crate [30]. As splitting the work into tasks comes with a runtime 
overhead (in the order of +35%), we only split the tasks until a certain recursion 
depth. From then on, we use the single-threaded apply algorithm. 


3.3 Safe Abstraction for Modifying Nodes 


A challenge when designing OxiDD was to find a safe abstraction for modifying 
nodes, e.g., during reordering, as it requires synchronization. A lock per node 
would lead to incorrect results when accessing nodes subject to a level swap, and 
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func_a.with_manager_shared(|manager, edge_a /* &Edge<'1> */| { 
let edge b /* &Edge<'1> */ = func_b.as_edge (manager) ; 
let edge res /* Edge<'1> */ = apply_and(manager, edge a, edge b); 
BDDFunction::from edge(manager, edge res) 
» 
// Mixing branded types leads to a compiler error. 
func a.with manager shared(|manager.a, edge a /* &Edge<'1> */| ( 
func b.with manager shared(|manager b, edge b /* &Edge<'2> */| { 
let edge res = apply.and(manager a, edge a, edge b); // <-- Error 
BDDFunction::from edge(manager a, edge res) 
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Fig. 7: Usage of branded types. 


moreover be diametral for performance. Instead, we use a single read/write lock 
to coordinate exclusive access to the entire DD. A shared append-only view is 
sufficient for apply algorithms and most other operations such as model counting 
or satisfiability checking. Reordering requires exclusive access. 

Once exclusive access is acquired, we must ensure that all nodes we modify 
actually belong to the respective manager we have exclusive access on. To this 
end, a safety invariant is required: All descendants of a node are stored in the 
same manager. This is a very natural assumption, also needed for correctness, 
avoiding a node in manager A to reference a node of manager B. As this invari- 
ant is needed for safety, there must not be a way to violate it from Safe Rust. 
The challenge is that when creating a node, there is no efficient way to check 
the invariant. After all, we only work with edges here, and edges do not (nec- 
essarily) provide any information about the manager the node belongs to. Only 
the function type stores both a node reference and a reference to the respective 
manager. So, before actually starting an apply operation, we must ensure that 
the operands (of function type) belong to the same manager, and the entire code 
in between needs to uphold the invariant. In a naïve implementation, without a 
proper abstraction, this would require a lot of unsafe code. 

We can drastically reduce the amount of unsafe code if every manager has its 
own edge and node types, as this prevents mixing edges from different managers. 
To realize this idea without fixing the number of managers upfront, we use 
branded types as presented by Yanovski et al. [42]. Branded types leverage Rust’s 
lifetimes. In Rust, a reference is essentially a pointer with the invariant that it 
is always safe to dereference. As references may point to stack variables, the 
compiler needs to make sure that the referenced variables do not go out of scope 
as long as the reference is live. This is done by adding a lifetime to reference 
types. The lifetime corresponds to the referenced variable’s scope. 

As an example, computing the conjunction of functions func_a and func_b 
works as in Lines 1-5 of Fig. 7. The with_manager_shared method acquires the 
lock (for shared access) of the manager referenced by func_a. Further, it takes a 
closure to which it passes the manager reference and edge. This is the place where 
the new brand/lifetime is introduced. We denote it as ' 1 in the comment. When 
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converting func_b into its underlying edge in Line 2, we check that it belongs to 
manager. If this is not the case, we abort the execution with an appropriate error 
message. Otherwise, we obtain an edge of the same branded type as edge_a. This 
means that when calling the recursive apply_and function, it can safely assume 
that the nodes referenced by edge_a and edge_b, as well as all their descendants 
are stored in the same manager. This simply follows from type safety. As the 
branded type is only valid inside the closure, we convert the resulting edge back 
into a function in Line 4. Notably, if we nest with_manager_shared calls as 
shown in Lines 6-12 of Fig. 7, we get a compile time error because the types 
of edge_a and edge_b have different brands. This safe abstraction enables the 
implementation of apply algorithms entirely within safe code. 


4 Evaluation 


OxiDD is designed not only for modularity and safety, but also with performance 
in mind. We (mostly) use zero cost abstractions and eliminate runtime checks 
via type invariants. Our evaluation is driven by two research questions: 


RQ1 How does the single-threaded runtime of OxiDD compare to other popular 
BDD libraries? 
RQ2 Can OxiDD achieve similar speed as Sylvan in the multithreaded setting? 


As the set of libraries we compare against, we choose BuDDy 2.4, CUDD 3.0.0, 
and Sylvan 1.8.0 since these are the most popular libraries. Furthermore, we com- 
pare against LibBDD 0.5.10, a relatively mature Rust library, and Adiar (commit 
ca4f7351), which apparently is the most performant external memory library in 
the large scale. The version of OxiDD corresponds to commit 8113c12. Among 
this set of libraries, Sylvan and OxiDD are the only multithreaded libraries. For a 
fair comparison, we integrated OxiDD into the bdd-benchmark framework? ini- 
tially developed by Steffan Sølvsten for the evaluation of Adiar [39]. It contains 
the following set of combinatorial and verification benchmarks: 


— N-Queens: Given N € [12,15], how many ways are there to place N queens 
on an N x N chess board without threatening each other? 

— Tic-Tac- Toe: Given N € [20,24], how many ways are there for player 1 to 
place N crosses in a 3D 4 x 4 x 4 cube and tie if player 2 places naughts in 
all remaining positions? 

— Picotrav: Given a hierarchical circuit, a BDD is constructed for each output. 
We use this to verify the equality of two circuits. In our case, the circuits 
are a subset of the EPFL combinational benchmark suite [3]. 


Input sizes and files are selected based on preliminary experiments regarding 
resource consumption. Note that complement edges are not beneficial for N- 
Queens and Tic-Tac-Toe: Negations occur on variables only, the remaining op- 
erations are just conjunctions and disjunctions. This is different for Picotrav. 


3 github.com/SSoelvsten/bdd-benchmark, our version is available at Zenodo [24]. 
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bdd-benchmark is designed in a way that is generic over the respective BDD 
library. All benchmarks are written against an abstract adapter that provides 
operations such as conjunction, disjunction, and negation in case of BDDs. This 
means that the same operations are executed with the same variable order, 
regardless of the DD library in use. In particular, dynamic reordering is disabled. 
Note that bdd-benchmark is written in C++, so OxiDD’s adapter makes use of 
the C++ bindings. All libraries except BuDDy use complemented edges. Only 
OxiDD implements both BDDs and BCDDs, as the genericity easily allows us 
to do so. Since both implementations are based on the same data structures, 
we also get a relatively good estimate of the performance impact complemented 
edges have. For the remainder of this section, we use ^OxiDD" to refer to the 
BDD implementation and explicitly add “BCDD” otherwise. 

We executed the benchmarks on a 16 core / 32 thread AMD Ryzen 9 5950X 
CPU with 128 GiB of RAM and approximately 800 GiB free SSD space, running 
Ubuntu 22.04 (Linux kernel 5.15). The libraries were compiled using Clang 16.0.6 
or rustc 1.71.1, which are both based on LLVM 16. We set a timeout of 3 hours. 
To reduce the number of TLB misses during execution, we enabled transpar- 
ent hugepages by setting /sys/kernel/mm/transparent hugepage/enabled to 
always. The default on many systems is that programs have to issue respective 
madvise calls. OxiDD is the only library that does this to some extent. The 
performance impact of this setting is quite large: In preliminary experiments we 
observed a 1.6x speedup for 14-Queens with BuDDy. We ran each benchmark 
three times and report the average running times. 


4.1  RQI: Single-thread Performance 


Overall, our benchmarks show that for single-threaded execution, BuDDy per- 
forms best. OxiDD is slightly slower and faster than all other libraries. In Fig. 8a, 
we show the runtimes on the N-Queens benchmark relative to OxiDD. For 
N = 12, OxiDD takes 4.28, 24.6s for N = 13, 2.4min for N = 14, and 16.1 min 
for N = 15. On 15-Queens, OxiDD performs best. BuDDy runs out of memory, 
mainly due to its limitation to 2?! — 1 nodes. As the BDD construction pro- 
duces more than 2?! nodes, this only works with sufficiently many GCs. OxiDD 
(BCDD) is restricted to 2?! nodes (the last bit is needed for complement edges), 
and the GCs cause OxiDD (BCDD) to be much slower than OxiDD in this spe- 
cific benchmark instance. Still, OxiDD (BCDD) is faster than CUDD, Sylvan, 
and LibBDD. For this problem size, breadth-first apply algorithms also start to 
shine. Adiar is only 1.03x slower than OxiDD. 

The situation is very similar for the Tic-Tac-Toe problem. For Picotrav, how- 
ever, complement edges may have a notable impact on the node count. On many 
instances, the BCDD variant of OxiDD performs slightly better than its BDD 
variant and BuDDy, see Fig. 8b. All libraries solved the smallest 21 out of 23 
instances, the remaining two timed out or ran out of memory. 

So with respect to RQ1, we can say that OxiDD is among the best libraries. 
However, a manager implementation that is not restricted to 2?! or 2??, respec- 
tively, might be interesting for some use cases. 
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4.2 RQ2: Multi-thread Performance 


From Fig. 8e, we observe that OxiDD’s parallelization is already effective in 
its initial release. However, for increasing number of threads, Sylvan performs 
better. This is probably due locking on each level in the unique table of OxiDD 
leads to contention. 14-Queens has 196 variables/levels, so it is not that unlikely 
that two out of 32 threads try to acquire the same lock. Notably, OxiDD's 
performance for 32 threads is slightly worse than for 16 threads. Especially for the 
smaller Picotrav instances (cf. Fig. 8d), we also observe a significant slowdown 
using 32 threads. Sylvan shows a slowdown as well, but not as serious as OxiDD. 
Only for the largest solved instance, Sylvan has a significant speedup of 10.5x 
for 32 threads (cf. Fig. 8f). 

Regarding RQ2, we conclude that Sylvan's highly optimized parallel engine 
leads to better performance on a high numbers of threads. In large combinatorial 
problems with at most 16 threads, OxiDD's parallelization outperforms Sylvan's. 
For the verification problems we tested, the current implementation does not 
achieve parallel speedups. Still, we remark that OxiDD in single-threaded exe- 
cution outperforms the multithreaded Sylvan significantly in all but one Picotrav 
instance. Note that OxiDDs parallelization can still be optimized, e.g., by using 
concurrent hash tables countering contention issues mentioned (cf. Section 3.2). 


5 Conclusion 


In this paper, we have presented OxiDD, a new decision diagram framework 
in Rust. OxiDD emphasizes on modularity, which eases extension on function- 
alities and new kinds of decision diagrams. Our implementations benefit from 
high performance and can safely be used in concurrent contexts. Depending on 
the workload, there may also be significant speedups in multithreaded execu- 
tion. We demonstrated this by comparing OxiDD's B(C)DD implementations to 
other popular BDD libraries. Moreover, we showed how we can leverage Rust’s 
type system to ensure that edges from different managers cannot accidentally 
be mixed up. This allowed us to implement the building blocks for dynamic 
reordering while keeping the apply algorithms entirely in Safe Rust. 

Aiming at the basis for future research and developments, there are plenty of 
opportunities. First, OxiDD's B(C)DD, MTBDD, and ZBDD implementations 
are not yet as feature-rich as matured BDD packages such as CUDD. Adding 
the remaining operations is, however, facilitated by our modular design. Second, 
we pointed out that the current unique table is likely to be a bottleneck for 
concurrent performance. Recently, there have been interesting developments on 
growing concurrent hash tables [29], which we plan to further investigate. Third, 
we plan to implement dynamic reordering heuristics relying on our reordering 
building blocks presented here. Last but not least, the argument that our unsafe 
code upholds Rust’s invariant is currently informal. Formally verifying OxiDD 
would be a challenging but rewarding avenue to pursue. 
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Abstract. We examine verification of concurrent programs under the 
total store ordering (TSO) semantics used by the x86 architecture. In our 
model, threads manipulate variables over infinite domains and they can 
check whether variables are related for a range of relations. We show that, 
in general, the control state reachability problem is undecidable. This 
result is derived through a reduction from the state reachability problem 
of lossy channel systems with data (which is known to be undecidable). 
In the light of this undecidability, we turn our attention to a more 
tractable variant of the reachability problem. Specifically, we study con- 
text bounded runs, which provide an under-approximation of the pro- 
gram behavior by limiting the possible interactions between processes. 
A run consists of a number of contexts, with each context representing 
a sequence of steps where a only single designated thread is active. We 
prove that the control state reachability problem under bounded context 
switching is PSPACE complete. 


1 Introduction 


Over the years, research on concurrent verification has been chiefly conducted 
under the premise that the threads run according to the classical Sequential 
Consistency (SC) semantics. Under SC, the threads operate on a set of shared 
variables through which they communicate atomically, i.e., read and write op- 
erations take effect immediately. In particular, a write operation is visible to all 
the threads as soon as the writer thread carries out its operation. Therefore, 
the threads always maintain a uniform view of the shared memory: they all see 
the latest value written on any given variable and we can interpret program 
runs as interleavings of sequential thread executions. Although SC has been 
immensely popular as an intuitive way of understanding the behaviours of con- 
current threads, it is not realistic to assume computation platforms guarantee SC 
anymore. The reason is that, due to hardware and compiler optimizations, most 
modern platforms allow more relaxed program behaviours than those permitted 
under SC, leading to so-called weak memory models. Weakly consistent platforms 
are found at all levels of system design such as multiprocessor architectures (e.g., 
© The Author(s) 2024 
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[33]32]), Cache protocols (e.g., [31[19]), language level concurrency (e.g., [24]), 
and distributed data stores (e.g., [17]). Program behaviours change dramatically 


when moving from the SC semantics to weaker semantics. Therefore, in recent 
years, research on the verification of concurrent programs under weak memory 
models have started to become popular. A classical example of weak memory 
models is the Total Store Ordering (TSO) semantics which is a formalization 
of the Intel x86 processor architecture [29]. The TSO semantics inserts an un- 
bounded FIFO buffer, called the store buffer, between each thread and the main 
memory. When a thread performs a write instruction, the corresponding opera- 
tion is appended to the end of the buffer, and hence it is not immediately visible 
to other threads. The write messages are non-deterministically propagated from 
the store buffer of a given thread to the shared memory. Verification of pro- 
grams that contain data races needs to take the underlying memory model into 
account. This is crucial in hardware-close programming, especially in concurrent 
libraries or kernels. Such applications are inherently racy; exploiting racy WMM 
operations for efficiency is standard practice. Our work serves as a foundation 
for ensuring the correctness of such systems, which often rely on these intricate 
memory models to achieve optimal performance. 


In a parallel development, significant research has been done on extending 
model checking frameworks to programs with infinite state spaces. There are 
two main reasons why a program might have an infinite state space. The first 
is that the program has unbounded control structures, which means it can have 
an unbounded number of threads. Examples include parameterized systems, in 
which correctness of the system is checked regardless of the number of threads, 
and programs that allow dynamic thread creation through spawning [I1]. Sec- 
ondly, the program may operate on unbounded data structures, such as clocks 
[12], stacks [16], and queues ([10]I]). These works, including their extensions, 
have been done under the SC assumption. Although recent works have started 
to explore parameterized verification for weak memory models |6]4]22], the ver- 
ification of programs that operate on a shared unbounded data structure with 
weak memory semantics has remained unexplored until now. 


In this paper, we combine infinite-state programs with weak memory mod- 
els: we study the decidability and complexity of the reachability problem for 
programs operating on unbounded data structures under the TSO semantics. 
While the TSO semantics has been extensively studied (e.g., [I5]5]), it has been 
assumed that the data domain is finite. This means that the possible values of 
a shared variable or a register are bounded. In contrast, our model allows for an 
infinite domain such as natural numbers N or real numbers R. It contains register 
assignments, an operator that may assign an arbitrary value to a register, and a 
set of relations that act as guards. We focus on relations equality and "greater 
than" on totally ordered sets and combinations, negations and inversions of 
them. Our model finds practical utility in continuously running concurrent pro- 
tocols. A prime example is the bakery ticket protocol used in various scenarios. 
It is presented in [Section 4] Here, an unbounded number of requests occur, each 
assigned increasing numbers and the lowest-numbered request is serviced. This 
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presents a scenario with inherent races that requires an infinite domain which 
our model can effectively capture. Note that our model is infinite in multiple 
dimensions: the threads are infinite-state as they operate on unbounded data 
domains, the store buffers are unbounded, and they carry write-messages over 
an unbounded domain. 

In order to perform safety verification, we need to decide whether there is an 
execution that can reach some undesirable control state. We study the control 
state reachability problem and show that for many domains and relations, it is 
undecidable. Therefore, we propose an alternative approach by introducing an 
under-approximation schema using context-bounding [30]28125]23114]. Context- 
bounding has been proposed in as a suitable approach for efficient bug de- 
tection in multithreaded programs. Indeed, for concurrent programs, a bounding 
concept that provides both good coverage and scalability must be based on as- 
pects related to the interactions between concurrent components. It has been 
shown experimentally that concurrency bugs usually show up after a small num- 
ber of context switches [28]. In this work, we study a context bounded analysis 
where only the active thread may perform an operation and update the memory. 
We show that in this case, the state reachability problem is not only decidable, 
but even PSPACE complete. To this end, we perform a two-step abstraction 
that employs insights about context bounded runs of TSO semantics as well as 
the structure of reachable configurations. 

In the first step of our abstraction process, we refine the methods introduced 
by [14]. Their construction introduces a code-to-code translation that abstracts 
the buffer, simplifying the problem to state reachability under SC. Our approach 
leverages the fact that this abstraction does not explicitly depend on variable 
values. In our case, the abstraction step yields a register machine where the reg- 
ister values are integers or real numbers, and the transitions are conditioned by 
^gap-constraints" [9]18[27]. Gap constraints serve to identify, within each system 
configuration, (i) the variables with identical values and (ii) the gaps (differ- 
ences) between variable values. Notably, these gaps can be arbitrarily large. The 
papers [911827] analyze programs with gap constraints within the framework of 
well-structured systems [8]20]. As a result, they do not provide upper bounds on 
the complexity. 

As another key contribution of this paper, we propose a method to achieve 
PSPACE completeness. The fundamental idea behind our algorithm is that for 
any system execution, there is an alternative execution with larger gaps among 
the variables. This implies that we do not need to explicitly track the gaps 
between variables, as is the case in [9[18]27|. Instead, we implement a second 
(precise) abstraction step, focusing solely on the order of variables. For any pair 
of variables x and y, we record whether xz = y, x < y, or x > y. 
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2 Related Work 


Not much current work considers the complexity and decidability of infinite-state 
state programs on weak memory models. Furthermore, most existing works con- 
sider parameterized verification rather than programs with infinite data domains. 
The paper [6] considers parameterized verification of programs running under 
TSO, and shows that the reachability problem is PSPACE complete. However, 
the work assumes that the threads are finite-state and, in particular, the threads 
do not manipulate unbounded data domains. The paper [22] shows PSPACE 
completeness when the underlying semantics is the Release-Acquire fragment of 
C11. The latter semantics gives rise to a different semantics compared to TSO. 
The paper also considers finite-state threads. 

In [2], parameterized verification of programs running under TSO is con- 
sidered. However, the paper applies the framework of well-structured systems 
where the buffers of the threads are modelled as lossy channels, and hence the 
complexity of the algorithm is non-primitive recursive. In particular, the paper 
does not give any complexity bounds for the reachability problem (or any other 
verification problems). The paper [15] considers checking the robustness prop- 
erty against SC for parameterized systems running under the TSO semantics. 
However, the robustness problem is entirely different from reachability and the 
techniques and results developed in this work cannot be applied in our setting. 

The paper [4] considers parameterized verification under the TSO semantics 
when the individual threads are infinite-state. However, the authors study a 
restricted model, where it assumes that (i) all threads are identical and (ii) the 
threads do not use atomic operations. Generally, parameterized verification for 
the restricted model is easier than non-parameterized verification. For instance, 
in the case of TSO where the threads are finite-state, the restricted parameterized 
verification problem is in PSPACE [6] while the non-parameterized problem has 
a non-primitive recursive complexity [13]. 

The are many works on extending infinite-state systems with unbounded 
data domains. Well studied examples are Petri nets with data tokens [27], stacks 
with unbounded stack alphabets [7], and lossy channel systems with unbounded 
message alphabets [I]. All these works assume the SC semantics and are hence 
orthogonal to this work. 


3 "Total Store Order (TSO) 


Let B = (true, false}. Given a function f : A — B with a € A,b € B, fla — b] 
is defined as follows: f[a + 6](a) := b, fla «— b](a’) := f(a’) for any a’ € A with 
a’ Z a. We write x € w for letter x € X occurring in word w € X* and w’ € w 
for w’ € X* being a subsequence of w. 

Let x and y be two natural (real) numbers. Let n € N, we use x <n y (resp. 
<n y) to denote that x+n < y (resp. x+n € y). A data theory is defined 
by a pair (D, RI) where D is an infinite data domain and RI C D x D > B is a 
finite set of relations over D. In this paper, we restrict ourselves to the set of 
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natural/real numbers as data domain, and the set of relations RI to be a subset 
of Rlen 2 {=, Æ, <, €, <n, <n| n € N}. We assume w.l.o.g. that 0 € D. 


Transition Systems A labelled transition system is a tuple TS = (I, £,T , Vinit) 
that consists of a set of configurations I, a finite set of labels £, a labelled 
transition relation 7 C I x £ x I, and an initial configuration Yint € IT. We 


write y E y for (y, 0,7’) € T. We say that 7 = tı ...tn € T* is a run of TS if 
there is a sequence of configurations 71, 72,---,Yn+1 Such that t; = y; ELS ^il 
for i < n and 7 = Yint- The run 7 ends in configuration 4,44. We say that y is 
reachable if there is a run m of TS that ends in y. 


Programs A concurrent program Prog consists of finite set of threads 7. Each 
thread t € 7 is a finite state machine that works on its own set of local registers 
Ra. The local registers of different threads are disjoint. Let R = Uter R4. The 
threads communicate over a finite set of shared variables X. The registers and 
the shared variables take their values from a data theory (D,RI). Formally, a 
thread is a tuple t = (Qi, R4, At, qL4) where Q; is a finite set of states of thread 
t, dit € Qt is the initial state of t, and A, C OQ, x Op x Q is a finite set 
of transitions that change the state and execute an operation op € Op. Let 
£ € X,ri,rg € Ry. A transition 6 € A; is a tuple ô = (q,op,q’) where the 
operation op € Op has one of the following forms: (1) rı := r9 assigns the value 
of register r2 to register r1, (2) rı := & non-deterministically assigns a value to 
register r1, (3) rl(r1, r2) checks if the values of the two registers rı and r satisfy 
the relation rl € RI, (4) rd(z, rı) reads the value of shared variable x and stores 
it in register r1, (5) wt(z,r1) writes the value of register rı to shared variable 
x, and (6) arw(z,ri,ra3) is the atomic read write operation which atomically 
executes a read followed by a write operation. 


TSO Semantics The TSO memory model [83] is used by the x86 processor ar- 
chitecture. Each thread has its own FIFO write buffer. Write operations wt(z, r) 
in a thread t do not update the memory immediately; if d € D is the value of 
r, then (x, d) is appended to the buffer of t. The buffer contents are updated to 
the shared memory non-deterministically. A read operation rd(z, r) in t accesses 
the latest write in the buffer of t. In case there is no such write, it accesses the 
shared memory. For the atomic read write operation arw(z,r1,r2) in thread t, 
the buffer of t must be empty (e€), and the value of x in the memory must be 
same as the value of r1. Then z is set to the value of rs. 

Formally, the TSO memory model is a labelled transition system. A configu- 
ration y is defined as a tuple y = (St, RVal, Buf, Mem) where St : T > User 9i 
maps each thread to its current state, RVal : R —> D maps each register in a 
thread to its current value, Buf : T — (AX x D)* maps each thread buffer to 
its content, which is a sequence of writes. Finally, Mem : X — D maps each 
shared variable to its current value in the memory. The initial configuration of 
Prog is defined by a tuple yinit = (Stinit, RValinit, Bufinit, Meminit) where Stinit maps 
each thread t to its initial states qf, RValinie and Meminit assign all registers 
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(q; rı = r2,q') € At 


assign 
i,ori:—T2 8 


(St, RVal, Buf, Mem) ————35 (St[t + q'], RVal[ri + RVal(r2)], Buf, Mem) 
(qd,T1:2 &,9) € AV. d€D 


E new value 
(St, RVal, Buf, Mem) "=", (St[t + q'], RVal[rı < d], Buf, Mem) 
(a,rl(r1,72),q') € At rl(R(ri), R(r2)) is 
reiation 
t, RVal, Buf, Mem) ===; (St[t + q'], RVal, Buf, Mem 
St, RVal, Buf, Mem) 227272), (s RVal, Buf, M 
(q, wt(z, 71), 4) € At 7 
write 


t,wt(x,r1) 


(St, RVal, Buf, Mem) ————> (St[t + q'], RVal, Buf[t «— (x, RVal(r1)).Buf(t)], Mem) 
(q,rd(z,r1), d) € At Ad € D : (v, d) € Buf(t) 


t,rd(z,r1) 


(St, RVal, Buf, Mem) ————> (St[é + q'], RVal[ri — Mem(z)], Buf, Mem) 
(rdi r),g) € Ae Buf(t) = a.(md).8 oBe(X-Dy* ddep:(nd)co 


t,rd(z,r1) 
——— 


global read 


local read 


(St, RVal, Buf, Mem) (St[t + q'], RVal[r1 < d], Buf, Mem) 


(q,arw(z,r1,T2),q) € A, Buf(t) =e RVal(ri) = Mem(z) 
atomic read write 


(St, RVal, Buf, Mem) 2 12^ (seit 4 q'], RVal, Buf, Mem[z + RVal(ra)]) 


memory update 


(St, RVal, Buf[t — Buf(t).(z, d)], Mem) + (St, RVal, Buf, Mem[z < d]) 
Fig. 1. The transition relation of TSO. We assume that St(t) = q. 


and shared variables the value 0, and Bufinit initializes all thread buffers to the 


empty word e. We formally define the labelled transition relation 5 on config- 
urations in [Figure 1| where the label £ is either of the form t, op (to denote a 
thread operation) or £, u (to denote an update operation) with t € 7 is a thread 
and op € Op is an operation. 


The Reachability Problem Reach Given a concurrent program Progand a state 
final € Qi of thread t, Reach asks, if a configuration y = (St, RVal, Buf, Mem) 
with St(t) = qgj,a is reachable by the transition system given by the TSO 
semantics of Prog. In this case, we say that the state qg,a; is reachable by Prog. 
We use Reach[D, RI] to denote the reachability problem for a concurrent program 
with the data theory (D, RI). 


4 Lamport's Bakery Algorithm 


To demonstrate the practical application of our model, we use it to model Lam- 
port's Bakery Algorithm [26]. Created by Leslie Lamport in 1974, it is a cor- 
nerstone solution for achieving mutual exclusion in concurrent systems. Picture 
threads as patrons entering a bakery, each is handed a unique ticket upon arrival. 
'These tickets, representing the order of entry, dictate the sequence for access- 
ing critical sections. They ensure an orderly execution flow and preventing race 
conditions in a critical section. 
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Each thread is assigned a unique number that is larger then the numbers 
currently assigned to other threads. The thread possessing the lowest number is 
granted entry to the critical section. This thread may access the critical section 
an unbounded number of times. This means the assigned tickets keep increasing 
and thus an infinite domain is required. Note that the algorithm does not rely 
on precise tickets values, we only need to compare the tickets to each other. This 
makes the protocol well suited to our program model. 


The protocol contains n threads where each thread i < n is associated with 
two variables: The ticket number ticket; and the flag chosen; which signals 
whether the thread has chosen a ticket number. We assume rr gy g and rFALSE 
are initialized with different values that represent the boolean values of a flag 
and that ticket; is initially the same as rrArsg for all i € n. 


The algorithm for thread i is given in |Algorithm 1| For the sake of simplicity 
and compactness we present the transition system as pseudocode. This is equiv- 


alent to a program definition since the code only accesses variables and registers 
using operations Op with relations Rlz. The remaining instructions only affect 
the finite control flow and can be expressed using transitions. 


Algorithm 1 Lamport Bakery Protocol 


: wt(choseni,rrALsE) (Begin choosing] 
ri :— & {Pick random ticket} 
for all 1 < j < n do 
rd(ticket;,r;) 
if (ri < rj) then 
goto line [1] {New ticket needed. } 
end if 
end for 
9: wt(ticket;,ri) {Ticket accepted} 
10: wt(choseni,rrRuE) {Choosing finished} 
11: for all 1 < j € n do 
12:  rd(chosen;,r;) 
13: if (rj x TTRUE) then 
14: goto line [12] (Thread j is still choosing) 
15: end if 
16:  rd(ticket;,r;) 
T if (rj x TFALSE & Tj < ri) then 


18: goto line [16] (Lower ticket j found) 
19: end if 
20: end for 


21: CRITICAL Section 
22: ri :— TFALSE 
23: goto line [I] (Back to NON-CRITICAL} 


Verification under TSO with an infinite Data Domain 283 


5 State Reachability for TSO with (Dis)-Equality 
Relation 


We show that the reachability problem for concurrent programs under TSO is 
undecidable when {=, 4} C RI. The proof is achieved through a reduction from 
the state reachability problem of Lossy Channel Systems with Data (DLCS) [1], 
which is already known to be undecidable. To simulate the lossy channel, we 
employ write buffers, as both are implemented as first-in-first-out queues. How- 
ever, there are three main distinctions that must be considered: (i) write buffers 
do not contain letters, (ii) write buffers are not lossy, and (iii) the semantics of 
reads differ from receives. 

We address these distinctions as follows: (i) We encode the letters as variables. 
(ii) We model writes being lost by avoiding to read them. (iii) To prevent buffer 
reads, we transfer the writes into a write buffer of a second thread with a different 
variable. We ensure that every write is accessed only once by overwriting them 
immediately with a different value. 


Theorem 1. Reach|D, RI] is undecidable for {=, 4} C RI. 


The rest of this section is devoted to the proof of the above theorem. We first 
recall the definition of Lossy Channel Systems with Data (DLCS) [I]. Then, we 
present the reduction from state reachability problem of DLCS to Reach[D, RI]. 


(q, x = y.q) € Ar 


= assign 
(q, XVal, w) 55 (q, XValla + XVal(y)], w) 
(q,0:=@,7) € Ac. dEnD\{XVal(y)| y € Xe} 
mue new value 
(q, XVal, w) ——> (q', XVal[z + d], w) 
(g£ =y, q} € Ac XVal(z) = XVal(y) 
= equality 
lq, XVal, w) 2 (q', XVal, w) 
(q£ y,q)€ Ac XVal(x) 4 XVal(y) |. : 
disequality 
(lq, XVal, w) 22% (q', XVal, w) 
(q, Ka, £), 7) € Ar 
send 


(lq, XVal, w) 2, (q’, XVal, (a, XVal(z)).w) 
(q, ? (a, z), q) € Ar 


33) receive 
(q, XVal, w.(a, d)) ——9 (q’, XValla + d], w) 
w <w f 
lossiness 


loss 


(q, XVal, w) ——> (q, XVal, w’) 


Fig. 2. The transition relation of DLCS 
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Lossy Channel Systems with Data A DLCS £ = (Qr, Xr, Ec, Ac, dini) consists 
of a finite set of states Qç, a finite number of variables X, ranging over an infinite 
domain D, a finite channel alphabet X£, qini € Q is the initial state, and a finite 
set of transitions Ar. The set A; of transitions is a subset of Qc x Op; x Qc. 
Let x,y € Xe. The set Op; consists of the following operations (1) x := y which 
assigns the value of y to z, (2) z := ®, which assigns a fresh value from D that 
is different from the existing values of all variabled!] (3) x = y (x Z y) which 
compares the value of variables x and y, (4) (a, x) which appends letter a € Xe 
together with the value of z to the channel, (5) ?(a, x) which deletes the head 
of the channel (a,d) and stores the value d in x, and (6) loss which removes 
elements in the channel. 

A configuration y of DLCS is defined by the tuple (q, XVal, w) where q € Or 
is the current state, XVal : Xe — D is the current valuation of the variables, 
and w € (X x D)* is the content of the lossy channel. The system is lossy, 
which means any element in the channel may disappear anytime. The initial 
configuration ^s: of L is defined by (qinit, XValinit, €) where XValint (£) = 0 for all 
a € Xe. The transition relation of DLCS is given in [Figure 2] 

The state reachability problem for £ asks whether, for a given final state 
{final € Q, there is a reachable configuration y of the form y = (qai, XVal, w). 
In this case, we say that the state qfinal is reachable by £. 


Theorem 2 ([1]). The state reachability problem for DLCS is undecidable. 


Gadi. s : (q; £ :— @,q') 


e^ “6-6 Z Tg Timp €T i é^ x a pa im d) 


Gad’, i : (2,1 (a,2).q 


rd(ya Tx) T. s ó -aa = 173 
o o iO, 


ch 
rd(za, romp 


) 


ch ch 
Timp z r$ 


ch ch 
— rg rd(za, TED 


) 


Fig. 3. Prog(£) with threads t (pink states) and ten (yellow states). 


^ This differs from the & in TSO where the value d € D assigned by the operation 
x :— ® can be anything. 
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Reduction from DLCS reachability Given a DLCS £ = (Qr, Xr, Xe, Ac, init) 
over data domain D with Xe = {21 . .. £n}, we reduce the state reachability of £ 
to the reachability problem Reach|D, {=, 4}] of a concurrent program Prog(L), 
with two threads t, teh. The thread t simulates the operations of £, while thread 
ten simulates the lossy channel of £ using its write buffer. Let Ri = (rg, rip) U 
{ra | £ € Ach, Ria = OS A be the local registers of threads t and teh. 
Corresponding to each z € Xz, we have the register r, in thread t, which stores 
the current values of x. Registers rimp and UM are used to temporarily store 
certain values. The shared variables of Prog(£) are 4 = {a,Ya | a € Vc), they 
help in simulating the behavior of the lossy channel of £. 


Simulating the DLCS. The transitions of Prog(£) are sketched in [Figure 3] The 
initialization of the program is omitted in the figure and goes as follows. The 
thread teh starts by assigning a non-deterministic value (say $) to the register 

rg (Le., rg :— &), then checks that the new value $ is different from 0 (i.e., by 
checking that ir z Timp)» and finally performs an atomic read write operation 
arw(z, rfh,p,r§") on each variable z € X. The thread t starts by reading the value 
of each shared variable x € A (i.e., performing rd(z, rg)) and checks if its value 
is different from 0 (ie., rg Æ Timp). At the end of this initialization phase, all 
the shared variables have the new value $, the registers ri, and Ps have the 
value 0 and the registers rg and r$ ^ have the value $. The current state of thread 
t is the initial state qinit of £ while the thread ten is in a state qen- 

Every transition (q, z := y,q’) € Ac is simulated in Prog(£) by threat t with 
a gadget—a sequence of transitions that starts in q and ends in q’. The transitions 
(q, x := y, q’), (q,£ — y, q') and (q, £ Æ y,q) in the DLCS are simulated by the 
thread t as gadgets with single transitions (q,r := ry, q'), (qd; rx = ry, q') and 
(q, rz £ Ty, q'), respectively. We omit their description in [Figure 3] 

To simulate x :— &, we load the new value in register rim, and ensure that it 
is different from the values in registers rg and fy; ...r,,. This is depicted by the 
gadget Gad... in thread t. The send operation (a, x) in the DLCS is simulated 
by the gadget Gadi, . In the DLCS, the send appends the letter a and the 
value of x to the der This is simulated by the write wt(£a, rs), thereby 
appending (%q,val(rz)) to the buffer of t. To simulate reads of the DLCS, we 
first make note of a crucial difference in the way reads happen in DLCS and 
'TSO. In DLCS, a read happens from the head of the channel, and the head is 
deleted immediately after the read. In TSO however, we can read from the latest 
write in the shared memory multiple times. In order to simulate the “read once" 
policy of the DLCS, we follow each wt(£a, rx) with another write wt(z,, rs). 

Thread ten is a loop from the state qen which continuously reads from za a 
value from a simulated send followed by the separator $. It copies these values 
to y, using local register r? mp: Lhe first time it reads from z,, it reads the value 
d of x from a simulated send l(a, x). It a that this is not the $ symbol 
(rfp # r8"), and writes this value from rff,,, into variable ya, thus apean 
(Ya, d) in the buffer of ten. It then reads again the value of x, into ss This 
time, it makes sure to read $ with the check rff,,, = r$". The receive ?(a, x) of 
the DLCS is simulated by Gad, . First, we read from ya and store it in rz, 
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ensuring this value d is not $. Then, we read $ from y,. This ensures that the 
earlier value d is overwritten in the memory and is not read twice. 

A loss in the channel of the DLCS results in losing some messages (a, d). 
This is accounted for in Prog, in two ways. Thread teh may not pass on a value 
written from Za to Ya since the loop may not execute for every value. Thread 
t may not read a value written by teh in ys since it was already overwritten by 
some later writes. 


Lemma 1. The state fina is reachable by L if and only if qfinai is reachable 
by Prog(L). 


The formal proof is given in Appendix A of the full version [3]./Theorem 1lextends 
to any set of relations that we can use to simulate equality and disequality. For 


instance €, £E RI. 


6 Context Bounded Analysis 


In the light of this undecidability, we turn our attention to a variant of the 
reachability problem which is tractable. We study context bounded runs, an 
under-approximation of the program behavior that limits the possible interac- 
tions between processes. A run consists of a number of contezts. A context is 
a sequence of steps where only a certain fixed thread t is active. We say that 
m € CB(k) if and only if there is a partitioning m = mı... mp such that for 
all contexts i < k there is an active thread t; € 7 such that only the active 


thread updates the memory and performs operations: If y 4 y! € Ti, then 
£ € {ti} x (OpU {u}). 


In the following, we show PSPACE completeness of CB(k)-Reach[D, RI<n] 
for relations such as (dis) equality, “greater than” or even “greater by at least 
n" for n € N (see [Theorem 4]. Our approach begins with a proof of PSPACE 
hardness through a reduction from the non-emptiness problem of the intersection 
of regular languages [21]. 

Next, we demonstrate PSPACE membership by reducing the problem to 
state reachability of a finite transition system which we solve in polynomial 
space. This reduction faces challenges from two main sources, namely, (i) the 
unbounded size of the write buffers, and (ii) the infinite data domain D. In this 
section, we show how to construct a finite transition system while preserving 
state reachability in two key steps. 

Following [14], we first perform a buffer abstraction. An in-depth analysis 
of the TSO semantics within context bounded runs reveals a critical insight: 
Even though the buffer may contain an unbounded number of writes, only a 
bounded number of these writes can be read later on. This allows us to non- 
deterministically identify and store the necessary writes using variables. 

Finally, we implement a domain abstraction. A popular approach is to ab- 
stract the values into equivalence classes based on the supported relations. This 
reveals our next challenge: (iii) the set of relations Rl<, is infinite. We conduct 
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an analysis of the reachable configurations and discover the following: If a config- 
uration is reachable, then any configuration that is the same except with greater 
distances between differing values is reachable as well. It follows that, for control 
state reachability, the abstraction does not require the precise distances between 
variables; their relative order is sufficient. 


6.1 Lower-bound 


We establish PSPACE hardness by polynomially reducing the problem of 
checking non-emptiness of the intersection of regular languages to CB(k)- 
Reach[D, Rl<,]. Given a set of finite automata A... An with A; = 
(Qi, A, a" OF), where A; C Qj x X x Qi, qin" € Qj, and Qr C Q; fori <n, 
the problem asks whether there is a word w € X* that is accepted by each 
automaton A; with i < n. This is known to be PSPACE hard[21]. 

We construct a program Prog(A;...A,) that consists of a single thread 
and reaches a state dfinai if and only if there is such a word. The idea of the 
construction is that we assign each state q; € Q; a unique value stored in a 
register rą and we store the value of the current state of each automaton A; 
in a register r;. To begin, we ensure that the current states are the initial ones. 
This means r; = r init holds for each i < n. Then, we choose a letter a € X 


and simulate some transition q; => q; € A; for each automaton. This is done by 
ensuring that the current state is q; with r; = rg, and then updating the current 
state with r; := rg. We repeat this step until each current state is a final state. 
At this point, we know we have simulated runs for each automaton that accept 
the same word and we reach dfinal- 

'The formal definition of the construction as well as the proof of correctness 
is given in Appendix B of [3]. This is a polynomial reduction of non-emptiness of 
the intersection of regular languages to CB(k)-Reach[D, Rl<,,]. Observe that we 
only need test for equality and disequality. The disequalitiy checks are necessary 
to ensure that each register rq, has been assigned a different value. 


Theorem 3. CB(k)-Reach[D, RI<n] is PSPACE hard. 


6.2 PSPACE Upper-bound 


Assume that we are given a program Prog and a context bound k. As an in- 
termediary step towards finite state space we construct a finite state machine 
AB(Prog, k) with variables, over the infinite data domain D. The name AB stands 
for abstract buffer as it abstracts from the unbounded write buffers using a finite 
number of variables. We show that AB(Prog, k) is state reachability equivalent 
with the TSO semantics of Prog bound by CB(k). 

While abstracting away the buffers, the main challenge is to simulate read 
operations. Recall from [Section 3|that each read operation in a thread accesses 
either a write from its own buffer or from the shared memory. A buffer read 
always reads from the threads latest write on the same variable. Since only the 
active thread may interact with the memory during the context, we can assume 
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w.l.o.g. that all memory updates occur at the end of a context. This means 
a memory read accesses the last write on the same variable that updated the 
memory in an earlier context, and hence we do not need to store the whole buffer 
content. For memory reads, we need the latest writes leaving the buffer at the 
end of each context for each variable. For buffer reads, we only require the latest 
writes on each variable that are issued by each thread. 


Construction of the abstract machine The abstract machine AB(Prog, k) is de- 
fined by the tuple (Qas, Xs, Aap, $R) where Qeg is the finite set of states, As 
is the finite set of variables, Agg is the transition relation, and q?5, is the ini- 
tial state. A control state qa € Qas is a tuple (St, act, j,c,u) where: (i) the 
current state of every thread is stored using function St : 7 — Q; (ii) function 
act : (1...k) — T assigns to each context an active thread; (iii) the current con- 
text is stored in variable j € (1... k}; (iv) the function c: X x T > {0,1...k} 
assigns to each variable r € X and thread t € 7, the (future) context j’ in 
which the latest write on x will leave the write buffer of t. This determines 
when ¢ can access the shared memory on that variable again; and (v) function 
u: (1... k) — 2* assigns each context j the set of variables that are updated 
during j. Additionally, we will introduce some helper states with the transitions 
relation. We omit them from the definition of O,g. The initial state q$, is such 
a helper state. 

The set of variables 143 contains: (i) the set of variables ¥ in Prog, (ii) the 
set of registers R, (iii) for each each context j € k and each variable x € 4’, we 
introduce a variable xj, which stores the value of the last write on x that leaves 
the write buffer in context j, (iv) for each thread t and each variable x € X, 
we introduce a variable x; which stores the value of the newest write of t on x 
that is still in the buffer of t. Notice that this is the write that t accesses when 
reading z (if such a write exists). 

We define the transition relation Ag in[Figure 4] Let Gnit(x, t) = 0 for all x € 
X and t € T, and uint(i) =O for all i € (1,..., k). The outgoing transitions of 
state q^. are the outgoing transitions of (Stinit, act, 0, Cinit, uini) for every possible 
function act. This means the construction guesses a function act and behaves as 
if the other elements in the tuple have the initial values. Local transitions are 
adapted in a straightforward manner. À read on z from the buffer occurs if there 
is a write on z in the buffer. This means the latest write on x leaves the buffer 
in a context c(z,t) after (or in) the current context j. In such a case, we access 
xt which holds the latest write on x in the buffer of t. If there is no such write 
on z in the buffer, i.e. c(z,t) < j holds, then the read fetches the value of x from 
the shared memory. 

A write operation on x overwrites the latest entry in the write buffer on that 
variable x, and determines a future (or current) context j' with j' > j in which 
it leaves the buffer. This is recorded in the variable xj, and x is added to the 
set u(j’) which holds the variables that are updated in context j’. Note that j’ 
cannot be smaller then any other context in which a write on a variable y leaves 
the buffer of t. This information is obtained from the function c. Also, j’ must 
be a context in which t is active. 
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(dig: OD. dip) € AaB dag = (Stinit, act, O, Cinit, init) |. op € {r1 := 12,71 := &, rl(r4, r2)} 
AB 7 init local 
(dinit? OD daB) (qag, OD. qas [St(t) «— qo]) 
op-—rd(z,ri) c(z,t)> j op = rd(z,rı) c(z,t) <j 
buffer read memory read 
(dap, T1 :— Tt, das [St(t) — q]) (dig T1 :— 2, qs [St(t) — qu]) 


op-—wt(r,ri) J Zj ac(j)-t j = maz(c((yt)|ye«] 
(dig, Tt = 11,95), (dos jr = r1, Qua [St(£) — qu, e(a,t) — j’ ulj’) + ug’) U (23]) 
op = arw(z,ri,r2) j= c(r,t) = max(c((y,t)) | v e X] 
(daB, Tt = 74, 95,1) (45,152; = r2, 45,2), (45,25 91 = T2, dag |St(£) € w, ulj) — ulj) U ix) 
op =arw(2,71,72) j >c(z,t) j> mar{c((y,t)) | y € X) 
(dig; £ = T1, 6). (q5, £ :— T2, dan[St(t) — 9]) 
ds € On j«k u(j) = {a',..., a} 


= T}, Qnew) n (dew m = T7 dap Ü c— j + 1) 


write 


buffer arw 


memory arw 


7 context switch 


(qag, X 


Fig. 4. The transition relation Aæ of AB(Prog, k). Let 6 = (qa, Op, qb) € At and qas = 
(St, act, j, c, u) with St(t) = qa and act(j) = t. 


At any time, the run can switch from a context j with j < k to j +1. 
Let u(j) = (xl ... z^). These are the variables that are updated during context 
j. The values of the last updates on these variables in the context, stored in 
zi ...v5, are written to the corresponding variables in the shared memory. Since 
AB(Prog, k) only performs memory updates at the end of a context, an atomic 
read write arw(z,r1,r2) requires that the current buffer content leaves the buffer 
in the current context. This is ensured by using the condition j > maz(c((y, t)) | 
y € X}. If there is a write on x in the buffer of t, then j = c(z, t). This is covered 
by the buffer arw rule in[Figure 4] Here, the current value of x is stored in £+, so 
we first check that it equals rı and update z, as well as x; with r2. If j > c(z, t) 
holds, then there is no write on x in the buffer of t (memory arw rule) and we 
compare the value of x in the shared memory with rı and update it to ro. 

A configuration y = (qas, Mem) in the induced LTS of AB(Prog, k) consists of a 
state qas € Qag along with a variable assignment Mem. Let Yint = (q^, Meminit) 
be the initial configuration of AB(Prog, k). Given the transitions Ag, we can 
define the transitions in the induced LTS in a straightforward manner. A state 
dfinai € Qi of thread t is said to be reachable by AB(Prog,k) if and only if 
there is a reachable configuration of the form ((St, act, j, c, u), Mem) such that 
St(t) = dfinat holds. 


Lemma 2. A state of Prog is reachable under TSO by a run n € CB(k) if and 
only if it is reachable by AB(Prog, k). 


The proof of [Lemma 2]is given in Appendix C of [3]. Next, we abstract away 
the infinite data domain from AB(Prog, k). We remove this last source of infinity 
by constructing a finite state machine Rl — AB(Prog, k) from AB(Prog, k). 


290 Parosh A. Abdulla et al. 


(qa, £ :— 2’, qin) € Am  x—gyx! Yri € Rle, Vz, y € Xap \ {x}: rlei(y, z) > rlre (y, z) 
((qas, RI), z :— 2’, (qig, RV)) € A 
(qi, £ :— ®, dhs) € Ap Vrl € Rle, Vz,y € As \ {x} : rlg(y, z) > elev (y, z) 
(ais, RI), £ :— ®, (qag, RI’)) € A 
(qas, rl” (x,y), qus) € Am rl ER< RI- RI rlg(z, y) 
((qas, RI), rl” (x, y), (qas, RI’)) € A 
(gas, rl” (x,y), qa) € Am ri” gRI< RI-R' z«my 
(Cano; RI), rl" (æ, y); (qas, RI’)) € A 


assign 


new value 


RI< relation 


RI<n relation 


Fig. 5. The transition relation of Rl<—AB(Prog, k). Sets RI and RI’ satisfy (i) equality 
is an equivalence relation; (ii) disequality holds iff equality does not hold; (iii) " < " is 
a total order on variables that are not equal. 


Domain Abstraction We use domain abstraction to solve CB(k)- 
Reach|D, Rl<,,] by reducing state reachability of AB(Prog, k) to reachability of 
a finite state machine. We introduce the set of relations Rl- = {=,4,<}. To 
abstract away the infinite data domain, we abstract from the exact values of 
the variables. Instead of storing actual values, we store which relations from Rl< 
holds between which pairs of variables, which is finite information. This way, 
we reduce the infinite domain D to the finite Boolean domain B. For example, 
(daz, X = y) is an abstraction of a configuration (qas, Mem(x) = 1, Mem(y) = 1). 
Given a variable assignment Mem and a relation rl, we define rlyeg (x, y) :— 
rl(Mem(a), Mem(y)). Any variable assignment Mem induces a set of relations 
Rlmem = {rlmem | rl € Rl<} over the variables 44g. When considering multiple sets 
of relations we denote a relation rl € RI as rly. For a variable assignment Mem, 
we say set of relations RI over variables is consistent with Mem if RI = Rlmem. 


Given AB(Prog, k) = (Qus, Vaz, Zug, Gi), we now construct the finite state 
machine RIz —AB(Prog, k) = (Q, A, qnit) as follows: Q :— Qas x (rl, : Ag X 
Ag > B | rl € Rle}. We abstract from a variable assignment by storing in the 
states which relations are satisfied. The initial state is init = (q5,, Rlyem,,,). We 
define the transitions of RIZ —AB(Prog, k) in We construct the transi- 
tions such that they abstract from the transitions of the LTS induced by the 
semantics of AB(Prog, k). Where the semantics on transitions of AB(Prog, k) re- 
quire that certain values in the configurations before and after the operation 
are the same, the transitions of RI —AB(Prog, k) only require that the relations 
between variables before and after the relation are the same. For instance, the 
assign rule for operation x := x’ requires that RI and RI’ are the same for all vari- 
ables except z and x —gy x’ must hold after the operation. Conditions (i)-(iii) 
in [Figure 5] reflect the properties of Rle on values. They ensure that RI and RI’ 
have consistent variable assignments. Note that for any operation <n (or <n), 
we soften the condition to r «mg y. We will show that this still results in an 
abstraction precise enough to be state reachability equivalent. 


Since RI — AB(Prog, k) is a finite state machine, it induces the obvious LTS 
where a configuration consists of a state. The following lemma shows that the 
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construction is indeed an abstraction of AB(Prog, k). We assume Prog uses Rl<,. 


Lemma 3. If q is reachable by AB(Prog, k), then a state (qas, RI) is reachable 
by RI —AB(Prog, k). 
Proof Assume ((qap, Mem) 25 (qig, Mem’)). We argue that 
((gag; Rlmem), OP; (dig; Rluen/)) € MA holds as well. The lemma follows im- 
mediately. We show this for operation xr :— ®. For all other operations, the 
proof is analogue and we omit it. 

It follows from the semantics of x :— &, that Mem(y) = Mem'(y) for any 
y € Xap \ {x} holds. This means Rlmem and Rlmemw satisfy the new value rule. The 
equality relations in Rlyem and Rilvem are consistent with the equality relations 
on values of Mem and Mem’. The equality relation given by the values is an 
equivalence relation and thus Condition (1) is satisfied. Similarly, Condition (ii) is 
satisfied since values are obviously not equal if and only if they are not related by 
equality. Condition (iii) is satisfied since relation « on values forms a total order. 
All conditions are satisfied. This means ((qag, RlMem); £ :— &, (dap, Rlmem’)) € A. 


Lemma 4. lf a state (qas, RI) is reachable by RI —AB(Prog, k), then qag is reach- 
able by AB(Prog, k). 


We prove this by performing an induction over runs of RI. —AB(Prog, k) and 
constructing equivalent runs of AB(Prog, k). In order to do this, we construct 
configurations with consistent variable assignments. The main challenge is that 
these variable assignments may not have large enough distances between the 
values. Take the operation £ <n y, for instance. Here, RI. —AB(Prog, k) only 
requires x < y. Note that any value other than 0 was created by an x :— & 
operation. We can modify a run so that some of these operations assign larger 
values. This way, we can increase the distances of variable assignments of reach- 
able configurations without changing their consistency with respect to relations. 
The formal proof of this is given in Appendix E of [3]. 


Theorem 4. CB(k)-Reach[D, Rl<,] is PSPACE complete. 


Proof. While Rlz, is an infinite set, Rl< has only 3 relations. This means 
RI-—AB(Prog, k) is a finite transition system where state reachability is decid- 
able. According to[Lemma 2] [Lemma 3]and [Lemma 4] deciding state reachability 
of RI<—AB(Prog, k) is equivalent to solving CB(k)-Reach[Rl<n]. 

We non-deterministically solve the state reachability of RI —AB(Prog, k) by 
guessing a run that is length-bounded by the size of the state space and checking 
whether it reaches qfinai: We store the current state ((St, act, j, c, u), RI) together 
with a binary encoding of the current length of the run. Note that the state 
only requires polynomial space. The number of states of RIZ—AB(Prog, k) is 
exponential in the program size as well as k, which means the binary encoding 
also requires polynomial space. 

We extend the run by choosing to either perform a context switch or an 


operation. We begin with the initial state q?5, which is a special case since we 
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first need to guess a function act according to the init rule in[Figure 4] To perform 
an operation, we look at the current state of the active thread St(act(j)), pick 
an outgoing transition from the program, and update the state according to the 
corresponding rules given in [Figure 4]and 

We illustrate this on the new-value operation. Assume we pick the outgoing 
transition (qa, x := ®, qo) € Aact(;). In this case, we update the state according 
to the local rule in Then we update the set RI according to the new- 
value rule in We leave all relations that do not include z unchanged, 
and we non-deterministically choose x to be either equal to some variable, or 
to be between two other adjacent variables, or to be the largest or smallest 
variable. We update the relations to x accordingly. For any other operation, 
the changes to RI are uniquely determined. For writes, we additionally need to 
non-deterministically pick some future context j' of the update according to the 
write rule in In the case of a context switch, we perform a series of 
variable assignments according to the context switch rule. 

Note that we do not explicitly construct the entire RI — AB(Prog, k) transition 
system; the program and the rules given in[Figure 4|and [Figure 5ļare sufficient to 
guess a run. Each step can be performed in polynomial space. Once St(act(j)) = 


dfinai holds, we know qfinai is reachable. The complexity of this process is in 
PSPACE. According to the problem is PSPACE hard as well. 


7 Conclusion 


We examined safety verification of concurrent programs running under TSO that 
operate on variables ranging over an infinite domain. We have shown that this 
is undecidable even if the program can only check the variables for equality and 
non-equality. We studied a context bounded variant of the problem as well. Here, 
we solved the problem for programs using relations in RI<n and showed that it 
is PSPACE complete. 

As future work, we plan to examine more expressive under-approximations 
of the program behaviour than the presented context bounded analysis and how 
these under-approximations affect decidability and complexity of the problem. 
We also intend to explore the problem for additional relations and/or operations 
a program may perform. 
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Abstract. The 13th edition of the Competition on Software Verification 
(SV-COMP 2024) was the largest competition of its kind so far: A total 
of 76 tools for verification and witness validation were compared. The 
competition evaluated 59 verification systems and 17 validation systems 
from 34 teams from 12 countries. This yields a good overview of the state 
of the art in tools for software verification. The competition was executed 
on a benchmark set with 30300 verification tasks for C programs and 
587 verification tasks for Java programs. The specifications again included 
reachability, memory safety, overflows, and termination. This year was 
the second time that the competition had an extra competition track 
on witness validation. We introduced a new witness format 2.0, and a 
new scoring schema for the validation track. All meta data about the 
verification and validation tools are available in the FM-Tools repository. 


Keywords: Formal Verification - Program Analysis - Competition - Soft- 


ware Verification - Verification Tasks - Benchmark - Specification - Java Lan- 
guage - C Language - SV-COMP - SV-Benchmarks - BENCHEXEC * CoVERITEAM 


Introduction 


Check for 
updates 


'This report describes the results of the 2024 edition of SV-COMP, and is an 
extension of the series of competition reports (see footnote). We also list important 
processes and rules, and give insights into some aspects of the competition. The 
13th Competition on Software Verification (https://sv-comp.sosy-1ab.org/2024) is 
again the largest comparative evaluation ever in this area. The objectives of the 
competitions were discussed earlier (1-4 [22]) and extended over the years (5-6 [23]): 


1. provide an overview of the state of the art in software-verification technology 


and increase visibility of the most recent software verifiers, 


2. establish a repository of software-verification tasks that is publicly available 
for free use as standard benchmark suite for evaluating verification software, 
3. establish standards that make it possible to compare different verification 


tools, including a property language and formats for the results, 


This report extends previous reports on SV-COMP [16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27]. 


Reproduction packages are available on Zenodo (see Table 3). 
P3 dirk.beyerGsosy-lab.org 
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4. accelerate the transfer of new verification technology to industrial practice by 
identifying the strengths of the various verifiers on a diverse set of tasks, 

5. educate PhD students and others on performing reproducible benchmarking, 
packaging tools, and running robust and accurate research experiments, 

6. provide research teams that do not have sufficient computing resources with 
the opportunity to obtain experimental results on large benchmark sets, and 

7. conserve tools for formal methods for later reuse by using a standardized 
format to announce archives (via DOIs), default options, contacts, competition 
participations, and other meta data in a central repository. 


The SV-COMP 2020 report [23] discusses the achievements of the SV-COMP 
competition so far with respect to these objectives. 


Related Competitions. SV-COMP is one of many competitions that measure 
progress of research in the are of formal methods [15]. Competitions can lead 
to fair and accurate comparative evaluations because of the involvement of the 
developing teams. The competitions most related to SV-COMP are RERS [80], 
VerifyThis [65], Test-Comp [28], and TermCOMP [73]. A previous report [23] 
provides a more detailed discussion. 


Quick Summary of Changes. While we try to keep the setup of the com- 
petition stable, there are always improvements and developments. For the 2024 
edition, the following changes were made: 


e New verification tasks were added, with an increase in C from 23805 in 2023 
to 30300 in 2024. 

e Tool archives are now uploaded to Zenodo, instead of GitLab, and the meta 
data about the tools are hosted and maintained in the Repository for Formal- 
Methods Tools (https: //gitlab.com/sosy-lab/benchmarking/fm-tools). 

e The improved witness format version 2.0 [7] (which is based on YAML instead 
of GraphML) was used for the first time. 

e The scoring schema for the witness validators [44] was changed based on the 
2023 community meeting in Paris. 


2 Organization, Definitions, Formats, and Rules 


Procedure. The overall organization of the competition did not change in com- 
parison to the earlier editions [16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27]. SV-COMP is 
an open competition (also known as comparative evaluation), where all verification 
tasks are known before the submission of the participating verifiers, which is 
necessary due to the complexity of the C language. The procedure is partitioned 
into the benchmark submission phase, the training phase, and the evaluation 
phase. The participants received the results of their verifier continuously via 
e-mail (for preruns and the final competition run), and the results were publicly 
announced on the competition web site after the teams inspected them. 


Competition Jury. Traditionally, the competition jury consists of the chair and 
one member of each participating team; the team-representing members circulate 
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Table 1: Scoring schema for SV-COMP 2024 (unchanged from 2021 [24]) 


Reported result Points Description 


UNKNOWN 0 Failure to compute verification result 
FALSE correct +1 Violation of property in program was correctly found 
and a validator confirmed the result based on a witness 
FALSE incorrect —16 Violation reported but property holds (false alarm) 
TRUE correct +2 Program correctly reported to satisfy property 
and a validator confirmed the result based on a witness 
TRUE incorrect —32 Incorrect program reported as correct (wrong proof) 


true (witness confirmed) of 2 | 


unconfirmed (false, unknown, or ressources exhausted) 4 o | 
invalid (error in witness syntax) 0 


WITNESS VALIDATOR 


unknown 


VERIFIER false 
TASK 
VERIFIER 


unknown 


false invalid (error in witness syntax) 0 


—» 
unconfirmed (true, unknown, or ressources exhausted) {o ] 


false (witness confirmed) 


Fig. 1: Visualization of the scoring schema for the reachability property (unchanged 
from 2021 [24]) 


WITNESS VALIDATOR 


every year after the candidate-submission deadline. This committee reviews the 
competition contribution papers and helps the organizer with resolving any 
disputes that might occur (cf. competition report of SV-COMP 2013 [17]). The 
tasks of the jury were described in more detail in the report of SV-COMP 2022 [26]. 
The team representatives of the competition jury are listed in Table 5. 


Scoring Schema and Ranking. The scoring schema of SV-COMP 2024 was 
the same as for SV-COMP 2021. Table 1 provides an overview and Fig. 1 visually 
illustrates the score assignment for the reachability property as an example. As 
before, the rank of a verifier was decided based on the sum of points (normalized 
for meta categories). In case of a tie, the rank was decided based on success 
run time, which is the total CPU time over all verification tasks for which the 
verifier reported a correct verification result. Opt-out from Categories and Score 
Normalization for Meta Categories was done as described previously |17, page 597]. 


License Requirements. Starting 2018, SV-COMP required that the verifier 
must be publicly available for download and has a license that 


(i) allows reproduction and evaluation by anybody (incl. results publication), 
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Table 2: Publicly available components for reproducing SV-COMP 2024 


Component Fig. 3 Repository Version 
Verification Tasks (a) gitlab.com/sosy-lab/benchmarking/sv-benchmarks svcomp24 
Benchmark Definitions (b) gitlab.com/sosy-lab/sv-comp/bench-defs svcomp24 
Tool-Info Modules (c) github. com/sosy-lab/benchexec 3.21 
Verifiers (d) gitlab.com/sosy-lab/benchmarking/fm-tools svcomp24 
Benchmarking (e) github. com/sosy-lab/benchexec 3.21 
Witness Format (f) gitlab.com/sosy-lab/benchmarking/sv-witnesses 2.0.2 
Continuous Integration gitlab.com/sosy-lab/software/coveriteam 1.1 


Table 3: Artifacts published for SV-COMP 2024 


Content DOI Reference 
Verification Tasks 10.5281/zenodo. 10669723 al 
Competition Results 10.5281/zenodo.10669731 [30 
Verifiers and Validators 10.5281/zenodo.10669735 29 


BENCHEXEC 10.5281/zenodo.10671136 12 
CoVERITEAM 10.5281/zenodo.10843666 45 


] 


[31] 
[30] 
[29] 
Verification Witnesses 10.5281/zenodo.10669737 [32] 
[122 
[45] 


(ii) does not restrict the usage of the verifier output (log files, witnesses), and 
(ii) allows (re-)distribution of the unmodified verifier archive via SV-COMP 
repositories and archives. 


Task-Definition Format 2.0. SV-COMP 2024 used the task-definition format 
in version 2.0. More details can be found in the report for Test-Comp 2021 [25]. 


Properties. Please see the 2015 competition report [19] for the definition of the 
properties and the property format. All specifications used in SV-COMP 2024 
are available in the directory c/properties/ of the benchmark repository. 


Categories. The community significantly extended the benchmark set for 
SV-COMP 2024. The (updated) category structure of SV-COMP 2024 is shown 
in Fig. 2. We refer to the previous reports for a description and mention only the 
changes here: Compared to SV-COMP 2023, we added two new sub-categories 
ReachSafety-Hardness and ReachSafety-Fuzzle to main category ReachSafety. We 
restructured main category SoftwareSystems as follows: We removed sub-categories 
SoftwareSystems-BusyBoz-ReachSafety, | SoftwareSystems-BusyBoz-MemSafety, 
and SoftwareSystems-OpenBSD-MemSafety, and added sub-categories 
SoftwareSystems-coreutils-MemSafety, SoftwareSystems-coreutils-NoOverflows, 
SoftwareSystems-Other-ReachSafety, and SoftwareSystems-Other-MemSafety. 
The categories are also listed in Tables 8, 9, and 10, and described in detail on 
the competition web site (https: //sv-comp.sosy-lab. org/2024/benchmarks .php). 


Reproducibility. SV-COMP results must be reproducible, and consequently, 
all major components are maintained in public version-control repositories. The 
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Fig. 2: Category structure for SV-COMP 2024 (changed from 2023) 
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(a) Verification Task (b) Benchmark Definition (c) Tool-Info Module 


(e) Verification Run 


UNKNOWN 


(d) Tool Archive 


(f) Violation 
Witness 


(f) Correctness 
Witness 


Fig. 3: Benchmarking components of SV-COMP and competition’s execution flow 
(same as for SV-COMP 2020, except that we now download the tool archives 
from Zenodo instead of GitLab) 


Table 4: Validation: Witness validators and witness linter 


Validator Reference Jury Member Affiliation 
ConcurWitNess2Test"™ [13 L. Bajezi BME Budapest, Hungary 
CPACHECKER 33, 34, 36] D. Baier LMU Munich, Germany 
CPA-WITNESS2TEST 35 T. Lemberger LMU Munich, Germany 
DARTAGNAN 106] H. Ponce de León Huawei Dresden, Germany 
CPnovzgn-wrrNEss2TEsT [35 (hors concours) = 

Gosuint ™ 112] S. Saan U. of Tartu, Estonia 
GWIT 81 F. Howar TU Dortmund, Germany 
JCWIT"™” Z. Cheng U. of Manchester, UK 
Lp 43 M. Spiessl LMU Munich, Germany 
METAVAL Al M. Spiessl LMU Munich, Germany 
Mopsa"*“ 99 R. Monat Inria and U. of Lille, France 
NITWIT 124] J. (P.) Berger RWTH Aachen, Germany 
SvuBioTIC- WrrcH 8] P. Ayaziová Masaryk U., Brno, Czechia 
UAUTOMIZER 33, 34] M. Heizmann U. of Freiburg, Germany 
WIT4JAVA® 123] (hors concours) - 

Wircu ^?" 7,9] P. Ayaziová Masaryk U., Brno, Czechia 
WirrNESSLINT 7] M. Lingsch-Rosenfeld LMU Munich, Germany 


overview of the components is provided in Fig. 3, and the details are given 
in Table 2. We refer to the SV-COMP 2016 report [20] for a description of all 
components of the SV-COMP organization. There are competition artifacts at 
Zenodo (see Table 3) to guarantee their long-term availability and immutability. 


Competition Workflow. The workflow of the competition is described in 
the report for Test-Comp 2021 [25] (SV-COMP and Test-Comp use a similar 
workflow). For a description of how to reproduce single verification runs and a 
trouble-shooting guide, we refer to the 2022 report [26, Sect. 3]. 
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Table 5: Verification: Participating verifiers with tool references and representing jury 
members; "*" for first-time, ^ for hors-concours (RELAY-SV "*" was not able to qualify) 


Participant Ref. Jury member Affiliation 

2Ls [46,96] V. Malik BUT, Czechia 

AISE"*" [121] Z. Chen NUDT, China 

BRICK [47] L. Bu Nanjing U., China 
BuBAAK [49] M. Chalupa ISTA, Austria 
BuBAAK-SPLIT"*" [50] M. Chalupa ISTA, Austria 

CBMC? [54,91] (h. c.) E 

COASTAL? [118] (h. c.) - 
CoVerRITEAM-ALGSEL? [37,38] (h. c.) - 
CoVzRITEAM-PARPonT? [37,38] (h. c.) = 

CPACHECKER [10,39] D. Baier LMU Munich, Germany 
CPALOCKATOR? [5, 6] (h. c.) 2 

CPA-BAM-BNB^ [4,120] (h. c.) = 

CPA-BAM-SMG? (h. c.) - 

cpyv"™ [53] P.-C. Chien LMU Munich, Germany 
Crux? [64,113] (h. c.) = 

CSEQ? [59,85] (h. c.) - 

DARTAGNAN [71,105] H. Ponce de León Huawei Dresden, Germany 
DEAGLE [76] F. He Tsinghua U., China 
DIVINE? [14,92] (h. c.) = 

EBF [3] F. Aljaafari U. of Manchester, UK 
EMERGENTHETA "*" [11] L. Bajezi BME Budapest, Hungary 
ESBMC-iNcR? [55,58] (h. c.) = 

ESBMC-KIND [70,97] F. Braufe U. Manchester, UK 
FRAMA-C-SV [42,60] M. Spiessl LMU Munich, Germany 
GAZER-THETA? [1,75] | (h.c) i 

GDART [101] F. Howar TU Dortmund, Germany 
GDaRT-LLVM? (h. c.) - 

GOBLINT [111,119] S. Saan U. Tartu, Estonia 
GRAvES-CPA? [93] (h. c.) = 

GRAVES-PAR® (h. c.) = 

INFER? [48,89] (h. c.) - 

JAVA-RANGER? [82,115] (h. c.) = 

JAYHORN [88, 114] H. Mousavi U. Tehran, TIAS, Iran 
JBMC [56,57] | P. Schrammel U. Sussex / Diffblue, UK 
JDanr? [95,100] (h. c.) i 

KORN [67,68] G. Ernst LMU Munich, Germany 
LAzv-CSEQ? [83,84] (h. c.) - 

LF-cHECKER? (h. c.) 

LocKsMITH® [107] (h. c.) = 

MLB L. Bu Nanjing U., China 
Mopsa [87,99] R. Monat Inria and U. Lille, France 


(continues on next page) 
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Table 5: Competition candidates (continued) 
Participant Ref. Jury member Affiliation 
PESCo-CPA? [109, 110] (h. c.) = 
PICHECKER? [116] (h. c.) = 
PINAKA® [52] (h. c.) = 
PREDATORHP [79,104] V. Soková BUT, Czechia 
Proton" [98] R. Metta TCS, India 
SPF? [102, 108] (h. c.) = 
SV-SANITIZERS "^" S. Saan U. of Tartu, Estonia 
SWAT" [94] N. Loose U. of Luebeck, Germany 
SYMBIOTIC [51,86] M. Jonáš Masaryk U., Czechia 
THETA [12,117] L. Bajezi BME Budapest, Hungary 
UAUTOMIZER [77,78] M. Heizmann U. Freiburg, Germany 
UGEMCUTTER [69,90] D. Klumpp U. Freiburg, Germany 
UKoJak [66, 103] F. Schüssele U. Freiburg, Germany 
UTAIPAN [63,74] | D. Dietsch U. Freiburg, Germany 
VERIABS [2,601] P. Darke TCS, India 
VERIABSL [62] P. Darke TCS, India 
VERIOOVER? (h. c.) - 


'Table 6: Algorithms and techniques that the participating verification systems used; 


new for first-time participants, ^ for hors-concours participation 


CE ER i = E 
A -E XSS u 2 c t =e - 
Chao fs EL “Bb BB B 
4 E FI EF) EA Aa MM ee 
t 9-5 Geo SR eS EH 4o 5x9 
Bx0 rs Gu uo 9S 5mrpS5rpo 
«ui z Ga e- e -ea ee e e a s 
sR o E s Ea s E.e Gee e E o eo eS a o 
“Ec HAH A A GHG BGEBA aA k 
Owe SDa Eagal Oas oUo 
ED BB EE E3 Aanu eB BUE 
Verifier VUanNAO Moa az ola a < e E Ko a a ac 
2Ls V/v Vv v V 
Aidsg =” P4 
BRICK P4 Vv "4 "4 
BUBAAK "4 r4 Vv P4 
BuBAAK-SPLIT"™ "4 y v Vv vv 
CBMC V "4 "4 
COASTAL” V 
CVT-ALGOSELY Js A V VSSA Vv 
CVT-PanPonr? Av EA v & V7 VSSA Vv Vv 
CPACHECKER PA EA CA EA VVAAVSA AI Vv 
CPALockATOR?Ó VV P4 Vv "4 


(continues on next page) 
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Table 6: Algorithms and techniques (continued) 


Bounded Model Checking 


k-Induction 
Numeric. Interval Analysis 


Property-Directed Reach. 
Automata-Based Analysis 
Concurrency Support 
Evolutionary Algorithms 
Algorithm Selection 


Symbolic Execution 
Shape Analysis 
Separation Logic 

<| Bit-Precise Analysis 
Ranking Functions 
Portfolio 


Verifier 
CPA-BAM-BNB? 
CPA-BAM-SMG? 
CPV "SS v 
Crux” "A 
CSEQ” "4 
DARTAGNAN v 
DEAGLE v 
DIVINE” V r4 "A 
EBF 
EMERGENTHETA "®™ 
ESBMC-iNcR? 
ESBMC-KIND 
FRAMA-C-SV v 
GAZER-THETA? V/v v y P4 

GDART y v y 
GDart-LLVM*” y v 

GOBLINT v v v 
GRAVES-CPA®” Vv vv Vv V^v7^Vv Vv vv 
GRAVES-PAR? 

INFER® Vv J 
Java-RANGER? "A P4 

JAYHORN Vv V P4 Vv 

JBMC "4 v v 

JDarr® "A P4 J 
KORN Vv v V 
LAzv-CSEQ? V P4 "4 
LF-cHECKER? 

LocksMrrH? "A 

MLB V r4 P4 
Mopsa v 
PESCo-CPA? Vv vv Vv 
PICHECKER® V/v 

PiNAKA? Vv 
PREDATORHP v 


«| CEGAR 

«| Predicate Abstraction 
<| Explicit-Value Analysis 
<| ARG-Based Analysis 

<| Lazy Abstraction 

<| Interpolation 


S 
S 
i, 


<A eS 


NN 
NSS 
SN 


SSS 
SS 
SS 


(continues on next page) 
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Table 6: Algorithms and techniques (continued) 


0 : 2 jn 
d tf FK E "M E 
siglo AEE aS, ites be 
oe of T o Bas ES 5 2 5 wP 
589 ua ERE: L e 9029-5270 
5 Eol -e [8] HEC gcs du zs IE Ed 
El o o aS Bee II o [7] 
x IE >- o 5M > SDa 
TwW>sgs=-ac eee gsa4lrs 
SH E? p]5" EEH GE X 
w o fol 0 BO! C > KR t Ec o E t ES EE 
c NM SES H GH- HU H a- = ene E 
oki- GH- BG GH- EH U H- O Gik 
"E EE EKXM BE Ed AHAA KE E 
Verifier Oacdom.eauizudomnua«lics«omwiuxscagu 
PROTON *" "4 
SPF” y P4 "4 
SV-SANITIZERS "^" "4 
SWAT Ne Jv 
SYMBIOTIC "4 P4 vv v v P4 
THETA Vv v Vv "4 v vv 
UAUTOMIZER Vv P4 VSSA vv 
UGEMCUTTER V v LSAS vv 
UKOJAK Vv P4 Vv 
UTAIPAN Vv Vv v VSSA vv 
VERIABS "4 Vv Vv Vv 
VERIABSL "4 vv vv Vv 
VERIOovER? 


'Table 7: Solver libraries and frameworks that are used as components in the participating 
verification systems (component is mentioned if used more than three times; "^" for 
first-time participants, ^ for hors-concours participation) 


HM 
S £ 
si E E & 
a & EN C E E 5 
m - V < = o Z < z 
< o NE 2 O0 WS os Bü ZE o 
< ^ i Es E z & 
P ae A [77] a 3 a < > = eo is A 
Verifier O o NM — CS ES « E 
2Ls "4 v 
Aisp e" 
BRICK "4 "4 
BUBAAK v 
BUBAAK-SPLIT"™ 
CBMC "4 V 
COASTAL” "4 
CVT-ALGOSEL” / RA P s v 
CVT-ParPorr® V 7 V P s v 
CPACHECKER s V v "4 


(continues on next page) 
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Table 7: Solver libraries and frameworks (continued) 


Verifier 


ESBMC 


JPF 


ULTIMATE 


Cvc4 


SMTINTERPOL 


z3 


MINıSAT 


APRON 


CPALockATOoR? 
CPA-BAM-BNB? 
CPA-BAM-SMG? 
Crux” 

CSEQ? "A 


S NS S| CPACHECKER 
CPROVER 


SN NL Java SMT 


S NN MATHSAT 


DARTAGNAN 
DEAGLE 

DIVINE? 

EBF 

EMERGEN THETA "*" 


ESBMC-iNcR? 
ESBMC-kKIND 
FRAMA-C-SV 
GazER- THETA” 
GDART 


GDaRr-LLVM? 
GOBLINT 
GRAVES-CPA® v 
Graves-Par®” 

INFER® 


JAVA-RANGER? 

JAYHORN 

JBMC "4 
JDART^ 

KORN 


LAzv-CSEQ? "4 
LF-CHECKER? 

LocksMrTH? 

MLB 

MoPsA 


PESCo-CPA? r4 
PICHECKER? "A 
PiNAKA? 

PREDATORHP 
PRoTON"*" 


SS 


SS 


SPF? 
SV-SANITIZERS "^" 


(continues on next page) 
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Table 7: Solver libraries and frameworks (continued) 


HM 
E z 
si E iS E 
[e & "Es < 2 = 
= > V E 23 dg 2 < z 
: Bao et EB A g 
< ^ i Es È Z č 
» A A n a a a S > = eo = A 
Verifier On o Bae - NE - Bee O Bae S Bee - 
SWAT = 
SYMBIOTIC v 
THETA 
UAUTOMIZER v V v v wv 
UGEMCUTTER P4 V V wv wv 
UKOJAK v V v 
UTAIPAN "A V v v wv 
VERIABS Vo v vv 
VERIABSL vv vv 
VERIOOVER™ 


3 Participating Verifiers and Validators 


The participating verification systems are listed in Table 5. The table contains 
the verifier name (with hyperlink), references to papers that describe the systems, 
the representing jury member and the affiliation. The listing is also available on 
the competition web site at https: //sv-comp.sosy-lab.org/2024/systems.php. Table 6 
lists the algorithms and techniques that are used by the verification tools, and 
Table 7 gives an overview of commonly used solver libraries and frameworks. 


Validation of Verification Results. The validation of the verification results 
was done by 17 validation tools (16 proper witness validators, and one witness 
linter for syntax checks), which are listed in Table 4, including references to 
literature. The ten witness validators are evaluated based on all verification 
witnesses that were produced in the verification track of the competition. 


Hors-Concours Participation. As in previous years, we also included verifiers 
to the evaluation that did not actively compete or that should not occur in the 
rankings for some reasons (e.g., meta verifiers based on other competing tools, or 
tools for which the submitting teams were not sure if they show the full potential of 
the tool). These participations are called hors concours, as they cannot participate 
in rankings and cannot “win” the competition. Those verifiers are marked as ‘hors 
concours’ in Table 5 and others, and the names are annotated with a symbol (?). 


4 Results of the Verification Track 


The results of the competition represent the the state of the art of what can be 
achieved with fully automatic software-verification tools on the given benchmark 
set. We report the effectiveness (number of verification tasks that can be solved 
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Table 8: Verification: Quantitative overview over all regular results; empty cells are used 
for opt-outs; "^" for first-time participants; the number of tasks includes invalid tasks 
that were excluded from scoring by the jury (details available on web site or in artifact) 


È F 
Do > S z n E HH S n -— 
ee  .EEM  DULEM CO. 
G28 834 og 589 222 222 ESE 582 oes 
dus d Aeg eS Pas gas Fae HB 4 O88 
Participant | Bee: ktd Eee ORES; 
MSS a AN O 8h ZAS ASR ARS ESA O85 528 
2Ls 6000 224 0 5976 1584 10 1311 10564 
AISE"*" 
BRICK 
BUBAAK 3788 1890 11 6465 1481 -1082 -617 12206 
BusBaak-SpLir"™ 4692 1312 7 -41374 661 872 1959 -18177 
CPACHECKER 10084 1897 2029 8603 1195 7844812 21568 
Opy™ 6330 
DARTAGNAN 3547 
DEAGLE 
EBF 636 
EMERGENTHETA "*" 1178 
ESBMC-KIND 8364 2077 1853 8272 1048 -1063 2394 17896 
FRAMA-C-SV 1098 
GDarr 616 
GoBLINT 2289 1304 2583 7059 890 536 15458 
JAYHORN 325 
JBMC 618 
KORN 
MLB 676 
Morsa 2241 1516 8063 2197 
PREDATORHP 2321 
PRoTon"™™” 3526 
SV-SANITIZERS "*" 290 
SWAT" 566 
SYMBIOTIC 7052 2156 238 7370 1258 687 4050 17192 
THETA 2119 2354 
UAUTOMIZER 6320 2110 30799497 3248 261 3139 26396 
UGEMCUTTER 3189 
UKoJAK 4869 1400 0 7363 0 233 2291 10593 
U'TAIPAN 5751 2014 2655 9231 0 351 3157 18042 
VERIABS 10541 
VERIABSL 10735 
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Table 9: Verification: Quantitative overview over all hors-concours results; empty cells 
represent opt-outs, ^ for hors-concours participation; the number of tasks includes invalid 
tasks that were excluded from scoring by the jury (details available on web site or in 


artifact) 
> = 
2 È 2 B 
EM». HS EA EA 
Tou gng EREI HES SEY REA SY TE 
- Ha Hd — EH a — 
a |cEEM DEM He3 Ba o 
Participant $ r5 Q Ze ENS 02% fon Bion ach 986 5 or 
moo 2an 088 226 ASR ASS BSA OSs SS 
CBMC? 1269 1330 1229 5771 1125 -2569 -3764 8391 
COASTAL” -2752 
CVT-ALGOSEL” 2635 41 
CVT-ParPorr® -6152 = 1655 911 -17812 1289 -1297 -9118 -7545 
CPA-BAM-BnB” -2439 
CPA-BAM-SMG^ 2039 -2804 
CPALockATOoR^ -4924 
Crux” 2066 490 
CSEQ” -12478 
DIVINE” 4655 298 390 0 0 76 256 3576 
ESBMC-1ncR®” 542 
Gazer-THETA® 
GDanr-LLVM? 
GRAvEs-CPA^? 3831 -322 -1538 5470 
GRAVES-PARÎ 876 1627 53 -17650 1256 -2037 -9024 -6731 
INFERÎ -99128 -8289 -73312 -24917 
Java-RANGERÎ 398 
JDARTĪ 382 
Lazy-CSEQ” -15024 
LF-cHECKER VIZ 
LocksMiTH? 
PESCo-CPA? 5814 -76 3247 17315 
PICHECKER® 521 
PiNAKA? 2418 1337 855 
SPF” 182 


VERIOOVER” 


and correctness of the results, as accumulated in the score) and the efficiency 
(resource consumption in terms of CPU time). The results are presented in the 
same way as in last years, such that the improvements compared to the last 
years are easy to identify. The results presented in this report were inspected 
and approved by the participating teams. 


Software Verification and Witness Validation: SV-COMP 2024 313 


Table 10: Verification: Overview of the top-three verifiers for each category; values for 
CPU time rounded to two significant digits; "^" for first-time participants 


Rank Verifier Score CPU Solved Unconf. False Wrong 
Time Tasks Tasks Alarms Proofs 
(in h) 

ReachSafety 

1 VERIABSL 10735 190 7075 1138 2 

2 VERIABS 10541 190 6 720 1032 1 

3 CPACHECKER 10084 200 6 468 286 2 

MemSafety 

1 PREDATORHP 2321 1.2 1823 3 3 

2 SYMBIOTIC 2156 0.77 1855 0 5 

3 UAUTOMIZER 2110 62 1637 4 

ConcurrencySafety 

1 DARTAGNAN 3547 14 2086 0 5 

2 UGEMCUTTER 3189 32 1851 4 1 

3 UAUTOMIZER 3079 28 1791 3 1 

NoOverfiows 

1 UAUTOMIZER 9497 62 4532 2 

2 U'TAIPAN 9231 66 4420 11 1 

3 CPACHECKER 8603 18 5596 192 

Termination 

1 PRoTow"*" 3526 19 1888 126 1 

2 UAUTOMIZER 3248 18 1631 11 

3 21s 1584 4.2 1167 201 

SoftwareSystems 

1 Mopsa 2197 15 2030 0 

2 BusBaak-SpLir 872 0.42 480 163 8 

3 CPACHECKER 784 43 1756 71 

FalsificationOverall 

1 CPACHECKER 4812 91 4920 218 10 

2 SYMBIOTIC 4050 27 4281 191 11 

3 U'TAIPAN 3157 33 1602 34 1 

Overall 

1 UAUTOMIZER 26396 290 13617 114 3 7 

2 CPACHECKER 21568 320 17 968 698 16 1 

3 UTAIPAN 18042 240 11524 71 1 13 

JavaOverall 

1 MLB 676 0.93 484 34 

2 JBMC 618 0.44 424 80 

3 GDarr 616 2.6 453 9 


Quantitative Results. Tables 8 and 9 present the quantitative overview of all 
tools and all categories. Due to the large number of tools, we need to split the 
presentation into two tables, one for the verifiers that participate in the rankings 
(Table 8), and one for the hors-concours verifiers (Table 9). The head row mentions 
the category, the maximal score for the category, and the number of verification 
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Fig. 4: Quantile functions for category C-Overall. Each quantile function illustrates 
the quantile (z-coordinate) of the scores obtained by correct verification runs 
below a certain run time (y-coordinate). More details were given previously [17]. 
A logarithmic scale is used for the time range from 1s to 1000s, and a linear scale 
is used for the time range between 0s and 15s. 


tasks. The verification tasks consist of tasks with expected verdict TRUE, expected 
verdict FALSE, and tasks that are VOID (tasks that were excluded from scoring by 
the jury). The tools are listed in alphabetical order; every table row lists the scores 
of one verifier. We indicate the top three candidates by formatting their scores in 
bold face and in larger font size. An empty table cell means that the verifier opted- 
out from the respective main category (perhaps participating in subcategories 
only, restricting the evaluation to a specific topic; DEAGLE was disqualified by 
the jury, with details on the web site). More information (including interactive 
tables, quantile plots for every category, and also the raw data in XML format) is 
available on the competition web site (https://sv-comp.sosy-1ab.org/2024/results) 
and in the results artifact (see Table 3). 

Table 10 reports the top three verifiers for each category. The run time (column 
‘CPU Time’) refers to successfully solved verification tasks (column ‘Solved Tasks’). 
We also report the number of tasks for which no witness validator was able to 
confirm the result (column ‘Unconf. Tasks’). The columns ‘False Alarms’ and 
‘Wrong Proofs’ report the number of verification tasks for which the verifier 
reported wrong results, i.e., reporting a counterexample when the property holds 
(incorrect FALSE) and claiming that the program fulfills the property although 
it actually contains a bug (incorrect TRUE), respectively. 


Score-Based Quantile Functions for Quality Assessment. We use score- 
based quantile functions [17,40] because these visualizations make it easier to 
understand the results of the comparative evaluation. The results archive (see Ta- 
ble 3) and the web site (https: //sv-comp.sosy-lab. org/2024/results) include such 
a plot for each (sub-)category. As an example, we show the plot for category 
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Evaluated verifiers 


Table 11: New verifiers in SV-COMP 2023 and SV-COMP 2024; column ‘Sub- 
categories’ gives the number of executed categories; "^" for first-time participants 
in 2024; ? for those that were hors-concours participants in 2024 


Verifier Language First Year Sub-categories 
AISE"*™ C 2024 1 
BuBAAK-SPLiT"*" C 2024 45 
Qpy new C 2024 20 
EMERGENTHETA "S" C 2024 15 
Proton "*" C 2024 5 
SV-SANITIZERS "®™ C 2024 13 
SWAT "ew Java 2024 1 
BuBAAK C 2023 40 
GDanr-LLVM? C 2023 1 
Graves-PaR” C 2023 40 
LF-cHECKER? C 2023 3 
MoPsA C 2023 32 
PICHECKER® C 2023 1 
VERIABsL C 2023 13 
VERIOOVER® C 2023 1 
MLB Java 2023 1 


C-Overall (all verification tasks) in Fig. 4. A total of 16 verifiers participated in 
category C-Overall, for which the quantile plot shows the overall performance over 
all categories (scores for meta categories are normalized [17]). A more detailed 
discussion of score-based quantile plots, including examples of what insights one 
can obtain from the plots, is provided in previous competition reports [17, 20]. 
The winner of the competition, UAvTOMIZER, achieves the best cumulative 
score (the graph for UAuromizer has the longest width from its left to its right 
end; the graph starts left from x = 0 because the verifier produced 7 wrong proofs 
and 4 false alarms and therefore received some negative points). Also other verifiers 
whose graphs start with a negative cumulative score produced wrong results. 
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New Verifiers. To acknowledge the verification systems that participate for 
the first or second time in SV-COMP, Table 11 lists the new verifiers (in 
SV-COMP 2023 or SV-COMP 2024). Figure 5 shows the growing interest in 
the competition over the years. 


Computing Resources. The CPU time and memory limits were the same 
as in the previous competitions [20] (15GB of memory and 15min of CPU 
time), but we reduced the number of processing units per run from 8 to 4 
processing units. This has the disadvantage that the measurements are more 
imprecise due to shared resources in the machine, but it roughly doubles the 
throughput. This change was necessary because of the ever increasing number of 
participating systems and the continuously increasing benchmark set. Witness 
validation was again limited to 2 processing units, 7 GB of memory, and 1.5 min 
of CPU time for violation witnesses and 15min of CPU time for correctness 
witnesses. The machines for running the experiments are part of a compute cluster 
at the SoSy-Lab at LMU that consists of 168 machines, where each machine 
has one Intel Xeon E3-1230 v5 CPU, with 8 processing units each, a frequency 
of 3.4 GHz, 33GB of RAM, and a GNU/Linux operating system (x86 64-linux, 
Ubuntu 22.04 with Linux kernel 5.15). We used BENcHExsc [40] to measure and 
control computing resources (CPU time, memory) and VCrou»p to distribute, 
install, run, and clean-up verification runs, and to collect the results. The values 
for the time are accumulated over all cores of the CPU. 

To give an impression of the overall computation work, we report some 
statistics: One complete verification execution of the competition consisted of 
787 779 verification runs (each verifier on each verification task of the selected 
categories according to the opt-outs), consuming 2 104 days of CPU time (without 
validation). This is almost double the CPU time spent for the previous edition 
of SV-COMP. Witness-based result validation required 13.6 million validation 
runs in 21243 run sets (each validator on each verification task for categories 
with witness validation, and for each verifier), consuming 2290 days of CPU 
time. Each tool was executed several times, in order to make sure no instal- 
lation issues occur during the execution. 


5 Results of the Witness-Validation Track 


The validation of verification results, in particular, verification witnesses, becomes 
more and more important for various reasons: verification witnesses justify and 
help to understand and interpret a verification result, they serve as exchange 
object for intermediate results, and they allow to make use of imprecise verification 
techniques (e.g., via machine learning). A case study on the quality of the results 
of witness validators [44] suggested that validators for verification results should 
also undergo a periodical comparative evaluation and proposed a scoring schema 
for witness-validation results. SV-COMP 2024 evaluated a total of 17 validators 
on 100 998 correctness and 71577 violation witnesses in format 1.0, and 45614 
correctness and 27561 violation witnesses in format 2.0. Figure 6 shows the 
growing importance of evaluating witness validators. 
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Fig. 7: Scoring schema for evaluation of validators; p = —16 for SV-COMP 2024; 
figure adopted from [44]; changed scores compared to 2023 are highlighted in red 
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Scoring Schema for Validation Track. The score of a validator in a sub- 
category is computed as 


Pcorrect Pwrong |correct| + |wrong| 
SCOTE = . 
lcorrect| — | wrong] 2 


where the points in pcorect and Pwrong are determined according to the schema in 
Fig. 7 and then normalized using the normalization schema that SV-COMP uses for 
meta categories |17, page 597] (note that the factor q is removed in comparison to 
last year [27, page 513] from the formula, because it is not necessary to give a higher 
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Table 12: Validation of correctness witnesses (version 2.0): Overview of the top-three 
validators for each category; values for CPU time rounded to two significant digits 


Rank Validator Score CPU Solved False Wrong 
Time Tasks Alarms Proofs 


(in h) 
ReachSafety 
1 UAUTOMIZER 4545 69 3830 
2 Mopsa "^*" 3284 17 2816 
3 CPACHECKER 2872 23 2784 
MemSafety 
1 UAUTOMIZER 5213 190 5701 
2 Mopsa "™ 5015 2.5 5658 
3 Goprivp "e" 4225 0.26 5677 
ConcurrencySafety 
1 Gosuint "*" 0 0 0 
2 missing validator 0 0 0 
3 missing validator 0 0 0 
NoOverflows 
1 UAuToMizeR 25441 220 17913 
2 MoPsA"*" 23601 7.8 17 333 
3 Gonrmr ^e" 17143 0.77 14125 
Termination 
1 GonBriNT "*" 0 0 0 
2 missing validator 0 0 0 
3 missing validator 0 0 0 
SoftwareSystems 
1 MopPsaA "*" 3521 23 6102 
2 GonriNT"*" 2793 9.2 4636 
3 UAUTOMIZER 1258 90 5 963 14 
Overall 
1 UAvutTomizeR 20919 570 33 407 14 
2 Mopsa "*“ 20889 50 31909 
3 GonBLINT"*" 16186 11 26 224 


weight to wrong witnesses anymore). Witnesses that do not agree with the expected 
verification verdict are classified as wrong. Witnesses that agree with the expected 
verification verdict can be wrong although they agree with the expected version, for 
example, if a violation witness has a wrong path to the violation, or a correctness 
witness has an invariant that does not hold. Therefore, we use the information 
from the majority of the validators: a witness that agrees with the expected 
verification result is classified as correct if at least 75 96 of the true/false results 
from validators confirm the result, and as wrong if at least 75% of the true/false 
results from validators refute this result (and there must be at least 2 true/false 
results). Further details are given in the proposal [44]. This schema relates to 
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Table 13: Validation of correctness witnesses (version 1.0): Overview of the top-three 
validators for each category; values for CPU time rounded to two significant digits 


Rank Validator Score CPU Solved False Wrong 
Time Tasks Alarms Proofs 
(in h) 

ReachSafety 

1 UAUTOMIZER 28020 540 21331 

2 CPACHECKER 25183 250 29 082 

3 Livy -44060 3.7 2527 31 

MemSafety 

1 UAUTOMIZER 259 4.8 366 

2 [py ne" 87 0.22 186 6 

3 Meta VAL 0 0 0 

ConcurrencySafety 

1 UAUTOMIZER 70 2.6 120 

2 missing validator 0 0 0 

3 missing validator 0 0 0 

NoOverfiows 

1 CPACHECKER 57309 170 45 618 9 

2 UAUTOMIZER 56467 320 42011 2 

3 METAVAL 0 0 0 

Termination 

1 missing validator 0 0 0 

2 missing validator 0 0 0 

3 missing validator 0 0 0 

SoftwareSystems 

1 CPACHECKER 3275 28 5812 

2 UAUTOMIZER 2211 240 13916 18 

3 Lyre’ 0 0 0 

Overall 

1 UAUTOMIZER 47571 1100 77 744 20 

2 CPACHECKER 35095 450 80512 9 


3 METAVAL -38172 1300 44 296 504 


each base category from the verification track a meta category that consists of 
two sub-categories, one with the correct and one with the wrong witnesses. 
Tables 12, 13, and 14 show the rankings of the validators. Violation witnesses 
in format version 2.0 were not yet ranked, because the jury decided that in 
SV-COMP 2024, this is a demonstration track. The score results for all validators 
and all categories are available on the SV-COMP web site ! and in the artifact [30]. 
Wrong proofs in Tables 12 and 13 are claims of a validator that the program 
is correct according to invariants in a given correctness witness although the 
program contains a bug (the validator confirms a wrong correctness witness). 
False alarms in Table 14 are claims of a validator that the program contains 


l https://sv-comp.sosy-lab.org/2024/results/results-validated/ 
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Table 14: Validation of violation witnesses (version 1.0): Overview of the top-three 
validators for each category; values for CPU time rounded to two significant digits 


Rank Validator Score CPU Solved False Wrong 
Time Tasks Alarms Proofs 
(in h) 

ReachSafety 

1 UAUTOMIZER 24390 120 15 932 4 

2 CPROVER-W2T 22251 18 16970 2 

3 CPACHECKER 15686 83 16 602 71 

MemSafety 

1 SvMBioric-WircH 799 0.59 1723 

2 CPACHECKER 626 3.7 1570 6 

3 UAUTOMIZER 472 5.2 809 

ConcurrencySafety 

1 DARTAGNAN 9186 37 8 674 

2 UAUTOMIZER 6742 72 6 533 

3 CPACHECKER 2110 14 3061 28 

NoOverfiows 

1 UAUTOMIZER 20030 63 10 236 5 

2 CPACHECKER 18892 281 14 323 

3 CPROVER-W2T 18400 7.7 13679 18 

Termination 

1 UAUTOMIZER 692 7.0 1004 

2 META VAL 0 0 0 

3 CPACHECKER -1496 5.3 993 26 

SoftwareSystems 

1 UAUTOMIZER 2633 26 3036 2 

2 SyMBIOTIC-WITCH 1696 0.59 1113 

3 CPACHECKER 1359 15 2474 

Overall 

1 UAUTOMIZER 43235 290 37 550 11 

2 SvuBioric-Wircu 20980 42 27 484 4 

3 CPROVER-W2T 19651 27 32 936 178 


a bug described by a given violation witness although the program is correct 
(the validator confirms à wrong violation witness). 

'The adoption rate of the new witness format version 2.0 is discussed in the 
article that defines the format [7]. Tables 12 and 13 shows that there are categories 
that are supported still by less than three validators (‘missing validators’ for 
categories ConcurrencySafety and Termination). 


While there are six new validators in SV-COMP 2024 (Fig. 6), and while 
there is a great adoption rate of the new witness format 2.0 (Table 12), 


there is still a remarkable gap in software-verification research: There are 
verification results that can not yet be independently confirmed. 
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6 Conclusion 


The 13th edition of the Competition on Software Verification (SV-COMP 2024) 
again compared automatic tools for software verification and the validation of 
the produced verification witnesses. SV-COMP again had a record number of 
59 participating verification systems (incl. 7 new verifiers and 19 hors-concours; 
see Fig. 5 for the participation numbers and Table 5 for the details). Furthermore, 
the validation track compared 17 validation tools; the validation tools were 
assessed in a similar manner as in the verification track, using a community- 
agreed scoring schema. The number of verification tasks in SV-COMP 2024 was 
significantly increased to 30300 in the C category. Table 10 shows that the top 
verification tools have an extremely low number of wrong results. However, there 
are still wrong results, and validation of the verification results is absolutely 
necessary. We hope that this overview and the competition leads to a broader 
adoption of software verification, and in particular, that more and better validation 
tools are developed in the near future. 


Data-Availability Statement. The verification tasks and results of the competi- 
tion are published at Zenodo, as described in Table 3. All components and data that 
are necessary for reproducing the competition are available in public version repos- 
itories, as specified in Table 2. For easy access, the results are presented also online 
on the competition web site https://sv-comp.sosy-1ab.org/2024. The main results of 
last year’s competition were reproduced in an independent reproduction study [72]. 
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Abstract. CONCURRENTWITNESS2TEST is a violation witness valida- 
tor for concurrent software. Taking both nondeterminism of data and 
interleaving-based nondeterminism into account, the tool aims to use 
the metadata described in the violation witnesses to synthesize an ex- 
ecutable test harness. While plagued by some initial challenges yet to 
overcome, the validation performance of CONCURRENTWITNESS2TEST 
corroborates the usefulness of the proposed approach. 
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1 Validation Approach 


'There are multiple violation witness validators in the ReachSafety category of 
SV-COMP that are based on test harness generation [3]. However, none take 
part in the category for concurrent programs, presumably due to the increased 
complexity in orchestrating the different thread interleavings prescribed by the 
witness files. CONCURRENTWITNESS2TEST aims to fill this gap, by providing 
an enhanced test harness that takes not only the data-nondeterminism into ac- 
count, but also the nondeterminism caused by concurrency. In this paper we 
concentrate on solving the latter, as the former is already well documented by 
the implementing tools [3]. 

'The current witness format for concurrent software defines two edge data 
fields that we can extract information from [3]: 


createThread: The unique ID of the new thread that results from the execution 
of the containing edge 

threadId: Which thread is currently active when the containing edge is exe- 
cuted. Valid values have at least one createThread entry in the witness 
automaton that must be executed prior to the current edge 
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Using these pieces of information, we insert a yield and release call around 
each action (as seen in the example in based on the metadata from 
ure 1), with the parameter target increasing at every encountered edge. These 
functions are shown in[Figure 3]and respectively. They rely on a shared 
variable current denoting the next value where the functions need to take effect 
(to handle revisited locations in the source, e.g., in a loop), alongside a mutex 
and a condition variable. Locking and unlocking in the figures refer to operations 
on the mutex variable; while broadcasting and waiting refer to operations on the 
condition variable. 

One of the main obstacles to overcome is the resolution of the threadID 
metadata. In our experience, none of the tools produce fully specified witnesses 
in terms of interleavings, i.e., not every action is totally ordered in the program. 
While this is acceptable according to the witness format [B], a certain level 
of nondeterminism might remain in the program after applying the witness. 
'To overcome this problem we rely on statistics, i.e., we execute the resulting 
harness multiple times, and classify the results as always observable, sometimes 
observable and never observable. Observability refers to that of the error state, 
tested by inspecting the exit code of the program. At SV-COMP’24 we opted to 
only refuse witnesses with never observable verdicts. 


2 Software Architecture 


CONCURRENTWITNESS2TEST is a Python project, relying on pycparsejlll for 
parsing C files, and networkxBl for parsing GraphML-based witnesses. As op- 
posed to the harness-only solutions of other witness-to-test validators [3], CON- 


' https: //github.com/eliben/pycparser 
2 https: / /networkx.org 
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Fig.5: Architecture of CONCURRENTWITNESS2TEST 


CURRENTWITNESS2TEST also needs to modify the AST of the C file to insert 
the function calls to yield and release, therefore the intermediate output of 
CONCURRENTWITNESS2TEST consists of a patched C file and a separate test 
harness. We use gcd] to compile these resulting files to an executable. We run 
this executable at most 100 times, with an option for early termination if the 
error becomes observable. See [Figure 5]for an overview of this workflow. 


3 Discussion of Strengths and Weaknesses of the 
Approach 


As seen in [Table If] CONCURRENTWITNESS2T EST lacks support for some tools’ 
witnesses. Since then, this limitation has been mostly rectified, but not in time 
for SV-COMP. The main shortcoming of the competition version of CONCUR- 
RENTWITNESS2TEST was the handling of cases where edge attributes were given 
for complex syntactic elements, such as loops, and we tried to insert the func- 
tion calls into the heads of loops instead of their body. This was an easy fix, 
and we hope to further the support for various tools even more for next year’s 
SV-COMP. 

Despite these temporary shortcomings, CONCURRENTWITNESS2TEST still 
correctly confirmed 1197 results[2]. In contrast, the validator was wrong only 239 
times: 2 witnesses were confirmed and 237 witnesses were refused erroneously] 
These numbers highlight the strength of our approach. 

We also note that CONCURRENTWITNESS2TEST confirmed 932 results with 
only a sometimes observable verdict. This means that multiple tools produce 
nondeterministic witnesses, where some interleaving leads the execution to an 
error state, but not all. We suggest tool developers to concentrate on providing 
better, deterministic witnesses in order for their results to always be validated. 
We will aim to constrain our acceptance criteria to always observable in future 
competitions. 


3 https: //gcc.gnu.org/ 


^ Unofficial results, since no official results were published at the time of writing. 
5 Here, erroneous covers all cases when the tool could not reproduce the bug. There- 
fore, this might not be our tool's shortcoming, but the result of bad witnesses. 
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Table 1: Results per supported tool, results for wrong verdicts in parentheses 


c 


DARTAGNAN 
DIVINE 

ETA 
UAUTOMIZER 
UGEMCUTTEF 
UTAIPAN 


m 
EB 


Confirmed 178 179 (2) 191 186 235 228 
Refused 79 25 (1) 8 74 22 29 
Error 193 111 96 168 194 170 


4 Tool Setup and Configuration 


The binary archive available at Zenodo |1| contains all required dependencies in 
the form a virtual environment except for the python 3 interpreter, which needs 
to be installed separately (e.g., via the python3 package on Ubuntu 22.04). 

'The tool can be started either directly via the main.py file, or the convenience 
script in start.sh. Either way, the tool expects two inputs: an argument provid- 
ing the (preprocessed) C file, and the witness file with the --witness <file> 
flag. Upon success, the tool always outputs a single line starting with the string 
Verdict:, with the verdict SOMETIMES/ALWAYS/NEVER directly afterward. Some 
handled exceptions also appear as verdicts. 

Up-to-date badges on verification tool support can be seen on the main 
GitHub paegd?] Tool support has been significantly enhanced since the version 
nominated for the competition, in preparation for next year's SV-COMP, and 
for tools to use that may want to improve their witnesses in the meantime. 


5 Software Project and Data Availability 


CONCURRENTWITNESS2TEST is a validation tool maintained by the Critical 
Systems Research Groug] of the Budapest University of Technology and Eco- 
nomics. The project is available open-source on GitHul] under an Apache 2.0 
license. The version (1.0.0) used in the competition is available at [1]. 
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Abstract. GOBLINT is an abstract interpretation framework for C pro- 
grams with a specialty in concurrency. Using a novel approach, we turn 
it into a validator of YAML correctness witnesses for all SV-COMP cate- 
gories. We describe its results at SV-COMP 2024 which includes the first 
large-scale evaluation of our validator. 


1 Validation Approach 


GOBLINT VALIDATOR is an extension of the GOBLINT verifier [14-16] for validation 
of correctness witnesses in the YAML format [1], consisting of location and loop 
invariants. The extension involves two related but independent components: 
witness invariants are checked for correctness and unassumed for speedup. We 
present here a high-level overview of our recently-published approach to abstract- 
interpretation-powered witness validation [17]. 

Correctness of witness invariants is determined by treating them as additional 
proof obligations. However, instead of inserting assert statements into the program, 
the validator uses the GOBLINT verifier as a black box to check whether its 
computed abstract states satisfy the witness invariants. Hence, invalid witness 
invariants cannot undermine soundness of the verification process via refinement. 

Speedup from witness invariants is attained by incorporating novel unassume 
statements with the invariants into the program. As opposed to refining the 
abstract state like assume operations, these relax the state instead. Doing so in a 
controlled manner, fixpoint iteration can converge faster, i.e., in fewer iterations. 
In the best case, the witness invariant precisely characterizes the fixpoint, avoiding 
further iteration. Unassuming can also make the abstract interpreter more precise, 
without requiring more expressive abstract domains, by leading the solver to a 
more precise fixpoint, which widening would otherwise extrapolate over [17]. 


* Jury member 
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Sound unassume operators must preserve all reaching concrete states, thus 
preserving soundness of the entire analysis. GOBLINT VALIDATOR implements 
two different unassume operators: 


1. For non-relational domains (e.g., numeric intervals or points-to sets), a classic 
propagating algorithm for assume operators [4, 7] is adapted with minimal 
modifications. This admits relaxing abstract values in dynamically allocated 
memory through pointers. 

2. For relational domains (e.g., octagons), dual-narrowing [8] is employed to 
retain more relations than a generic unassume operator definition [17]. 


2 Software Architecture 


GOBLINT VALIDATOR builds on the GOBLINT verifier [14-16] which is imple- 
mented in OCAML, uses an updated fork of CIL [12] as its frontend and APRON [9] 
for relational domains. 

Instead of altering the control-flow graphs, unassume statements are inserted 
implicitly as events that activated analyses can handle. In the modular architec- 
ture of GOBLINT [2] the unassume analysis is responsible for emitting these events 
after transfer functions corresponding to witness invariants. Widening tokens [10] 
are used to delay widening and allow the invariants to be incorporated without 
immediate precision loss. The solution of a side-effecting constraint system [3, 18] 
is post-processed to validate witness invariants and determine the verdict. 


3 Strengths and Weaknesses 


Overall, GOBLINT VALIDATOR inherits the strengths and weaknesses of GOBLINT; 
which are described in its tool papers [14-16]. Thanks to the generic validation 
approach, the validator works in all SV-COMP categories as the GOBLINT 
verifier, including those that are currently excluded from correctness witness 
validation, e.g., concurrency. Due to over-approximation, the verifier can only 
prove the absence of bugs, but not their presence. Consequently, the validator 
can currently only confirm correctness witnesses. However, it could be extended 
to reject violation witnesses in the future. 

We evaluate our validator according to the same three aspects considered 
by Beyer et al. [6]: same-framework consistency, content-effort dependence and 
cross-framework validation. The first two only focus on witnesses produced by 
the GOBLINT verifier. 

Regarding same-framework consistency, table 1 lists how many tasks with each 
property it can verify and how many of those witnesses GOBLINT VALIDATOR 
can confirm. The overall average confirmation rate of 78% is lower than the 
90% Beyer et al. [6] report for CPACHECKER and UAUTOMIZER with GraphML 
witnesses. Reasons for unconfirmed witnesses range from excessive precision 
loss by unassuming to validator crashes. In some cases, the validator exceeds 
resource limits, likely due to large witnesses with many unhelpful invariants. A 
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Table 1. Number of tasks verified by GOBLINT and their witness validation verdicts by 
GOBLINT VALIDATOR, grouped by property. 


Correct GOBLINT 


GOBLINT VALIDATOR 


Property tasks verified Confirmed Unconfirmed 
unreach-call 11,351 1,894 1,064 (56%) 830 
no-overflow 5,562 3,932 3,416 (87%) 516 
termination 1,536 619 297 (48%) 322 
no-data-race 781 695 510 (73%) 185 
valid-memsafety 2,796 1,963 1,801 (92%) 162 
valid-memcleanup 2 0 = = 
Total 22,028 9,103 7,088 (78%) 2,015 


CPU time for GOBLINT VALIDATOR (s) 
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Fig. 1. CPU time scatter plot where each mark (in blue) indicates a task verified by 
GOBLINT and whose witness was confirmed by GOBLINT VALIDATOR. Ordinary least 
squares (OLS) regression (in red) follows y = 0.762 + 0.14 (r? = 0.94). 


Table 2. Percentage of witnesses from other verifiers confirmed by GOBLINT VALIDATOR. 


Verifier 


ULTIMATE 


CPACHECKER CPV MoPsA AUTOMIZER GEMCUTTER KOJAK TAIPAN 


Confirmed 


8% 


6% 78% 


46% 


60% 57% 


51% 
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handful of instances indicate mismatches between witness generation and their 
interpretation due to implementation errors in either the verifier or the validator. 
Fixing such issues could improve the overall quality of the framework [6]. 
Regarding content-effort dependence, fig. 1 plots the corresponding verification 
and validation times in the 7,088 confirmed cases. While the results at the low 
end (< 1s) are noisy, the results at the high end (> 5s) show the benefit of 
witness validation, with up to 10x improvements. Regression analysis estimates 
an average speedup of 24%, which matches our previous results [17], albeit with 
greater variance. This is unlike CPACHECKER and UAUTOMIZER for which no 
general performance improvement from consuming witnesses was observed [6]. 
Regarding cross-framework validation, table 2 presents the confirmation 
rate of GOBLINT VALIDATOR of correctness witnesses from other tools. For 
the ULTIMATE tool family, the percentage is between 46% and 60%, which is 
similar to what Beyer et al. [6] observed. We have a high ratio for the MoPsA 
abstract interpreter [11], although it only produces trivial witnesses containing 
no invariants, on which GOBLINT VALIDATOR effectively reduces to the GOBLINT 
verifier. Nevertheless, overwhelming success of MOPSA in the SoftwareSystems 
category warrants independent validation of abstract interpretation results. 


4 "Tool Setup and Configuration 


GOBLINT VALIDATOR version svcomp24-0-gc2e9465a7 took part in all categories 
except FalsificationOverall of SV-COMP 2024 [5, 13]. It is available in both binary 
(Ubuntu 22.04) and source code form at our GitHub repository.^ Instructions for 
building from source can be found in the README. 

'The tool-info module for BENCHEXEC is named goblint and the benchmark 
definition for SV-COMP is goblint-validate-correctness-witnesses-2.0. 
They correspond to running the tool as follows: 


./goblint --conf conf/svcomp24-validate.json \ 
--set witness.yaml.unassume witness.yml V 
--set witness.yaml.validate witness.yml \ 
--set ana.specification property.prp input.c 


5 Software Project and Contributors 


GOBLINT VALIDATOR development takes place alongside GOBLINT on GitHub, 
while related publications are listed on its website.? It is an MIT-licensed project 
initiated by Technische Universität München and the University of Tartu. 


Acknowledgments. This work was supported by Deutsche Forschungsgemeinschaft 
(DFG) — 378803395/2428 CONVEY 2. We would like to thank everyone who has 
contributed to the GOBLINT framework over the years, laying the foundation for our 
validator. 


^ https: //github.com/goblint /analyzer/releases/tag /svcomp24 
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Data Availability Statement. All data of SV-COMP 2024 are archived as described 
in the competition report [5] and available on the competition website. This includes 
the verification tasks, results, witnesses, scripts, and instructions for reproduction. The 
version of GOBLINT as used in the competition is archived on Zenodo [13]. 
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Abstract. WITCH 3 is a new validator of violation witnesses in the wit- 
ness format 2.0. Note that our previous tool, SYMBIOTIC-WITCH 2, can 
validate only violation witnesses in the old GraphML format. WITCH 3 
validates witnesses of reachability of an error function, overflows, and in- 
valid dereferences and deallocations. Similarly to SyMBIOTIC-WITCH 2, 
the tool is based on symbolic execution and uses parts of the SYMBI- 
OTIC framework. Support of the witness format 2.0 in WITCH 3 includes 
features not supported by SYMBIOTIC-WITCH 2, such as constraints on 
the program variables and function return values, specifying statements 
by column, and providing the concrete statement in which the violation 
occurs. These additional features can further restrict the explored state 
space, and, more importantly, allow for much more precise validation. 


1 Witness Validation Approach 


WITCH 3 is a new validator of violation witnesses based on symbolic execu- 
tion. It extends the line of validators SyMBIOTIC-WITCH [1] and SYMBIOTIC- 
WITCH 2 [2], which are used to validate violation witnesses in the GraphML 
witness format [6] (now called 1.0). The main difference of WrrCH 3 is that it 
processes witnesses in the witness format 2.0! [3] (also known as “the YAML for- 
mat"). Since this format is based on witness segments and waypoints as opposed 
to witness automata from the GraphML format, there are large differences in the 
validation process compared to SYMBIOTIC-WITCH 2. 

Since the tool performs symbolic execution on the LLVM IR [9] of the in- 
put program and some information may be lost during the compilation, we first 
preprocess both the witness and the input program. The preprocessing entails 
wrapping the branching conditions in the program with a special function so 
that the condition is not decomposed or flipped during compilation. This ensures 
that the conditions in the branching statements and the corresponding branches 
are correctly mapped to those described in the witness. Another crucial step is 
adjusting the witness so that the identifiers of the waypoints will be preserved 


* This work has been supported by the Czech Science Foundation grant GA23-065068. 
** Jury member of SV-COMP 2024 
! Description is available at https:/ /gitlab.com/sosy-lab/benchmarking/sv-witnesses. 
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in the debug information in the LLVM program. In this phase we also inject the 
constraints from the assumption waypoints into the input program as calls to 
a special function __VALIDATOR_assume which will be handled later. After this 
preprocessing, the tool compiles the program into LLVM IR and runs the internal 
validator WITCH-KLEE on the LLVM program and the preprocessed witness. 


The validator begins symbolic execution in the entry point of the program, as- 
sociating this state with the first segment of the witness. Throughout the process, 
each state of symbolic execution is associated with one witness segment. 


Whenever the tool executes an instruction that could be associated with a 
waypoint of type function_enter, function_return, or branching (i.e. a func- 
tion call, function return, or a branching instruction), it is checked whether this 
instruction matches a waypoint of the associated segment and the corresponding 
constraint is enforced on the state. More precisely, if the instruction matches 
the type and location of an avoid waypoint in the segment, the negation of the 
constraint in the witness is added to the path condition of the state to guarantee 
that the waypoint is avoided. If the path condition is not satisfiable, the current 
state of symbolic execution is terminated. Note that this is always the case for 
waypoints of type function_enter, as their fixed constraint true is negated into 
false. If the instruction matches the follow waypoint of a segment, we add the 
given constraint to the path condition and the witness traversal moves to the 
next segment. 

The assumption waypoints are handled slightly differently. Since the con- 
straints are already injected in the program, what remains is to enforce them 
at the right time. Hence, whenever a __VALIDATOR_assume call is executed, the 
tool checks whether the current state of symbolic execution is associated with 
the corresponding segment. If it is not, the call is ignored and symbolic execution 
continues normally. Otherwise, for a follow waypoint, the tool adds the constraint 
to the path condition of the state and moves to the next segment. For an avoid 
waypoint, we enforce the negation of the constraint in a similar manner. If the 
resulting path condition is not satisfiable, the state is terminated. 


If the symbolic executor detects a property violation, the tool investigates 
whether the violating instruction matches the target waypoint, which is the last 
waypoint of the violation witness. If the segment associated with the violating 
state is not the last, the tool terminates the current state but continues exploring 
other states of symbolic execution. This is also the case if the associated segment 
is the last but the target waypoint of the segment does not match the instruction 
violating the property. Otherwise, i.e., if the witness traversal reached the target 
waypoint, WITCH 3 confirms the witness by reporting false. 

If the exploration ends without confirming the witness, there are two possible 
results. Normally, WrTCH 3 outputs true to refute the witness. However, the 
symbolic executor used by WITCH 3 may replace a symbolic value by a concrete 
one due to an unsupported feature (for example, it does not support symbolic 
floats). This substitution can cause that not all possible execution paths are 
explored and thus a valid witness can be refuted. Hence, in such instances, witness 
refutation is suppressed and WITCH 3 reports unknown. 
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2 Strengths and Weaknesses 


The main strong point of WITCH 3 is the support of all features of the for- 
mat 2.0. This includes enforcing constraints on the values of program variables. 
These constraints can be included also in the GraphML witnesses, but they are 
ignored by both SYMBIOTIC-WITCH and SYMBIOTIC- WITCH 2 with the excep- 
tion of the equality constraints on the return values of __VERIFIER_nondet_* 
functions. These tools also ignore the attribute offset (replaced by column in 
the witness format 2.0) specifying the exact location of an instruction on a given 
program line. Such shortcomings of our older validators can lead to incorrectly 
validated witnesses and more unknown results. In contrast, full support of the 
new format allows WITCH 3 to produce much more reliable results. Moreover, 
even in the cases where our older validators produce a correct result without 
using the contraints provided in the witness, WITCH 3 can use the constraints to 
reduce the explored state-space and thus speed up the validation. 

On the negative side, the witness format 2.0 currently supports only witnesses 
of reachability of an error function, overflows, and invalid dereferences and deal- 
locations. Hence, WITCH 3 can only be used in these categories. Once the format 
is extended for more properties, we plan to implement their support. 

Another shortcoming is that the tool currently requires the exact location 
of a waypoint, including the optional column number. This does not cause any 
incorrect results since the validation fails in the case of missing information. 
Moreover, as of SV-COMP 2024, all tools which produced violation witnesses in 
the format 2.0 included this information. Despite this, we consider it a weakness 
and plan to fix it in future versions of the tool. 

WITCH 3 also inherits weaknesses from the technology that it uses. The fact 
that the symbolic executor works with programs in LLVM requires more prepro- 
cessing on the program and the witness to ensure that no crucial information is 
lost during the translation. For this reason, there are cases in which the valida- 
tion process may be slower. Additionally, the program may contain some inner 
nondeterminism, such as an unspecified order of evaluation, which is resolved 
during the compilation. If this order is different to that prescribed by the wit- 
ness, the witness may be incorrectly refuted. However, we have not yet found any 
such incorrect result in practice. Most incorrect results stem from technical errors 
such as missing models of library functions — these functions are then treated 
as nondeterministic and pure, which may not be the case. 


3 Software Architecture 


WITCH 3 can be divided into two components: SYMBIOTIC [8], which is used as 
a wrapper for the second component, and WITCH-KLEE, a witness validator for 
LLVM programs. 

For the purposes of this tool, we extended SYMBIOTIC 9 with scripts for pre- 
processing the witness and the program as previously described. It also compiles 
the program into LLVM, links necessary function models, and parses the output 
of the internal validator, WITCH-KLEE. 
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WITCH-KLEE takes the program in the LLvM IR and the preprocessed wit- 
ness and performs the validation. The tool is based on the symbolic executor 
JETKLEE, a fork of KLEE [7] developed for the purposes of SYMBIOTIC. WITCH- 
KLEE uses the YAML-CPP? library to parse the witness in the YAML format and 
Z3 [10] as the SMT solver in symbolic execution. 

Both components of WITCH 3 use LLVM 10.0.1. 


4 Tool Setup and Configuration 


The archive containing WITCH 3 as it participated in SV-COMP 2024 is avail- 
able on Zenodo [4]. The validation is invoked by the command 


./symbiotic [--prp <prop>] [--32 | --64] --witness-check <witness> <prg> 


where <prop> is the considered property, the switches --32 and --64 spec- 
ify the considered architecture, <witness> is a violation witness in the YAML 
format, and <prg> is the input C program. The property can be provided either 
as a .prp file or one of the shortcuts no-overflow and valid-memsafety. The 
default setting is the property of unreachability of the function reach_error and 
the 64-bit architecture. 

The version of SYMBIOTIC used by WITCH 3, as well as the internal validator 
WITCH-KLEE, are available on GitHub (see below) under the tag svcomp24. To 
build WITCH 3 from its sources, build each of the components separately. To run 
the validator, add the location of the WITCH-KLEE executable to $PATH and use 
the command as presented above. 


5 Software Project and Contributors 


Both components of WITCH 3 are available on GitHub. The source code of the 
validator WITCH-KLEE is available at 


https: //github.com/ayazip/witch-klee 


and the source code of the version of SYMBIOTIC used by WITCH 3 can be 
found at 


https: //github.com/ayazip /symbiotic/tree/witch-klee. 


The tool has been developed at the Faculty of Informatics of Masaryk Univer- 
sity by Paulina Ayaziová under the supervision and with advice of Jan Strejéek. 
It is available under the MIT license and all internally used tools and libraries 
(LLVM, JETKLEE, Z3, YAML-CPP, SYMBIOTIC) are available under open-source 
licenses that comply with SV-COMP's policy for the reproduction of results. 


? https: //github.com/jbeder /yaml-cpp 
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Data Availability Statement. All data of SV-COMP 2024 are archived as described 
in the competition report [5] and available on the competition web site. This includes 
the verification tasks, results, witnesses, scripts, and instructions for reproduction. The 
version of WITCH 3 used in the competition is archived on Zenodo [4]. 
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Abstract. AISE is a static verifier that can verify the safety properties 
of C programs. The core of AISE is a program verification framework 
that synergizes abstract interpretation and symbolic execution in a novel 
manner. Compared to the individual application of symbolic execution 
or abstract interpretation, AISE has better efficiency and precision. The 
implementation of AISE is based on KLEE and CLAM. 


Keywords: Abstract Interpretation - Symbolic Execution - Program 
Verification. 


1 Verification Approach 


Given a program P and a property v, a software verification technique or tool 
verifies whether 7 satisfies y, i.e., all the behavior (e.g., the program paths) 
of P satisfies y. If P does not satisfy y, a counter-example (e.g., a program 
input) will be given to demonstrate the violation of y. Until now, many software 
verification techniques and tools have been developed and applied in different 
areas to result in successful stories [3[18]20[7T17]. 

AISE is a software verifier that verifies C programs with respect to reachability 
properties [5]. AISE’s key idea is to synergize symbolic execution (SE) [4121] and 
abstract interpretation (AT) [1O[11]. In the main loop, our tool performs symbolic 
execution to analyze the program under verification. However, SE faces path ex- 
plosion problem when the program contains loops, which makes it infea- 
sible for sound verification. AI can abstract a program in an over-approximation 
manner and automatically infer the program invariants at different program loca- 
tions, which can be used to verify the property. However, the imprecision caused 
by over-approximation may result in false positives. AISE aims to combine these 
two techniques in a synergic manner to improve the verification’s scalability as 
much as possible while ensuring precision. When doing SE, AISE carries out AI 
online to verify a part of the program, which can be used to prune the safe paths. 
On the other hand, SE can also improve the precision of AI. AISE only reports 
the violations detected by SE. 


* Jury member 


© The Author(s) 2024 
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Fig. 1: AISE's verification framework 


2 Framework 


Figure [I] shows AISE's framework, which contains an AI module and a SE mod- 
ule. The two modules communicate by delivering control-flow graph (CFG) and 
verification results to help each other. On the one hand, SE constructs the sub- 
CFG on which AI is carried out; On the other hand, the verification results of 
AI are returned to SE to prune the redundant paths, i.e., the paths that are 
guaranteed to satisfy the property. 


2.1 Symbolic Execution Module 


'The SE module takes a C program as input and then executes the program with 
symbolic inputs. The SE procedure is a state-forking procedure [4]. The whole 
process is as follows. At the beginning of execution, the SE module constructs an 
initial symbolic state for the input program. As the state is executed, the data 
of the state is changed by executing instructions one by one. When the state 
encounters a branch instruction, a new state is forked based on the original state. 
A global state pool containing all forked states is maintained. After executing an 
instruction, the current state is paused, and another state is selected from the 
state pool to execute. When a state is terminated (i.e., a state after executing 
the program exit instruction), AISE constructs a sub-CFG that contains all the 
instructions and the edges of the execution path that led to the state and carries 
out AI on the sub-CFG. Based on the AT verification result, the state pool 
is updated, i.e., adding the newly forked states or removing the pruned states. 
When SE finds a violation of an assertion, AISE reports the violation. 


2.2 Abstract Interpretation Module 


'The AI module takes a sub-CFG as input and outputs safe or unsafe. Given the 
abstract domain [II], the AI module analyses the CFG to produce an invariant 
at each program location. The invariant describes the constraints of variables at 
the program location. Then, based on the invariant I, we can check the property 
y by checking the validity of J > vy. If all assertions are checked, AISE can prune 
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states that can only reach the edges in the sub-CFG. Intuitively, all the possible 
paths start from these states are contained in the sub-CFG, so they are all safe 
paths. Therefore, we can prune all states from which only the nodes and edges 
of the sub-CFG can be reached. Pruning states reduces the path space of SE 
and improves the scalability of verification. 


2.3 Example 


Figure [2] gives an exampld?]to illustrate the idea of AISE. This program contains 
a loop adding y to x. AI using interval domain fails to verify this assertion 
because y's invariant at Line 11 is (—oo, 1000000], which is not sufficient to prove 
the assertion. SE can verify this program by exploring all paths, but SE needs a 
long time as the paths of this program are numerous. 


int main() { 
int x=__VERIFIER_nondet_int(); = EEA 
int y=__VERIFIER_nondet_int() ; y-. VERIFIER nondet int() 
if (!(y <= 1000000)) 
assume(y <= 1000000) 
return 0; 
Af (y>0) { ame > 0) 
while(x«100) { H 
x-xty; while(x « 100) 
} E n 
) x=x+y assert (y<=0 || (y»0 && x>=100) 
assert (y<=0| | (y>0 && x>=100)); : 
exit 
return 0; 
} Fig. 3: CFG based on a path 


Fig. 2: C code segment 


AISE can verify this program successfully in a short time. The SE module 
only needs to explore a few paths because many can be pruned. After SE module 
explores the following path: 223—4—6—7—58—1—11—412, it constructs the 
sub-CFG in Figure 3| based on this path. For this sub-CFG, the AI module 
successfully verifies the assertion. Then, AISE framework updates the state pool 
in the SE module, killing all the states forked from line 7. These states are forked 
when encountering the loop head. Then, there are no more states in the pool and 
SE terminates, i.e., a safe result. This also demonstrates that SE can improve 
the precision of AI by considering the sub-CFG of a symbolic path. 


3 Implementation, Results and Discussion 


AISE’s implementation is based on the AI framework CLAM and the SE 
tool KLEE [7]. STP [15] is the SMT solver of SE. AISE accepts the input in 
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LLVM |1| intermediate representation. The AI module of AISE uses the polyhedron 
abstract domain [12], and we use the implementation in Apron library 19]. The 
search strategy of the SE module is nurs:covnew. Besides, AISE also integrates 
ESBMC to handle floating-point programs because the SE’s module does not 
support the analysis of floating-point programs. 

AISE participants in the ReachSafety-Loops category of SV-COMP 2024 [6]. 
Table 1|shows AISE’s results. AISE achieved 847 points in this category, and there 
were 4 tools ranked ahead of it: Bupbaal B, 
Symbiotic 20), VeriAbs [13], VeriAbsL 14l. 
The figurd^| shows the score-based quantile 
plots in this category. When the time is less 
than about 100s, AISE achieved the high- total tasks 790 
est score among all the tools. If the pruning 
method works, AISE can verify a program in a 
short time; otherwise, AISE may fail to finish 
the job. Many of the AISE’s failed cases are 
the programs with non-linear expressions. AI 
is limited for non-linear polynomials. Besides, score 847 
AISE is also not efficient at handling large ar- 
rays. For example, AISE does not support symbolic size array, which is an inher- 
ited shortage from KLEE. 


Table 1: AISE’s results 


number time(s) 


total correct 491 9400 

correct true 356 6200 

correct false 135 3200 
total incorrect 0 0 


4 Software Project, Setup and Contributors 


AISE only participats in the ReachSafety-Loops category of SV-COMP bench- 
marks. The usage of AISE is as follows. 


./bin/aise <program> 


The <program> is the input program. AISE only needs the input program as the 
parameter because all the properties in the ReachSafety-Loops benchmarks are 
the same, i.e., (unreach-call, ILP32), and these properties are built in AISE. 

AISE can be found at |https://github.com/zbchen/aise-verifier| AISE is a pro- 
totype project developed by National University of D echnology. The 
license of AISE is GPL 3.0. People involved in the project are fully listed as the 
authors of this paper. 


Data-Availability Statement 


AISE's artifact is available at Zenodo: |https://doi.org/10.5281/zenodo.10201159 
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Abstract. BUBAAK-SpLit is a tool for dynamically splitting verifica- 
tion tasks into parts that can then be analyzed in parallel. It is built 
on top of BUBAAK, a tool designed for running combinations of veri- 
fiers in parallel. In contrast to BUBAAK, that directly invokes verifiers 
on the inputs, BUBAAK-SpLit first starts by splitting the input program 
into multiple modified versions called program splits. During the split- 
ting process, BUBAAK-SpLit utilizes a weak verifier (in our case symbolic 
execution with a short timelimit) to analyze each generated program 
split. If the weak verifier fails on a program split, we split this program 
split again and start the verification process again on the generated pro- 
gram splits. We run the splitting process until a predefined number of 
hard-to-verify program splits is generated or a splitting limit is reached. 
During the main verification phase, we run a combination of BUBAAK- 
LEE and SLOWBEAST in parallel on the remaining unsolved parts of the 
verification task. 


1 Verification approach 


BUBAAK [7] is a program analysis tool that runs multiple verifiers at the same 
time, and uses ideas from runtime monitoring and enforcement [5,10] to mediate 
the communication of useful information between the verifiers, such as invariants 
or already explored parts of the program. As of this year, the verifiers can be 
executed in an arbitrary combination of sequential and parallel portfolio, fully 
dynamically based on the information learned during the verification process. 
With BUBAAK-SpLit, we explore program splitting [12,13] as a way to im- 
prove the scalability of the verification process. The main idea behind program 
splitting is to split a given program P into multiple subprograms P,,...,P, 
which then can be analyzed in parallel. As a result, BUBAAK-SpLit can verify 
multiple subprograms with multiple verifier instances at the same time. 
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; int main(void) { 
int x = nondet(); 
if( x >= 1000 ) abort(); 


if( x <= 10 ){ 
hard_to_verify_1(x); 

7 }else { 

hard to verify 2(x); 


} hard ... 1(x) || hard_..._2(x) 


10 } 
Fig. 1. Overview over the verification process of BUBAAK-SpLit for the given example. 
BuBAAK-SpLit splits program that are too hard to be verified by a weak verifier (gray 


nodes), stops for easy-to-verify nodes (crossed out nodes) and it proceeds until n hard- 
to-verify splits are found (green nodes). 


Control-flow Splitting. BUBAAK-SpLit adopts control-flow splitting? [13] for 
splitting programs into subprograms. Control-flow splitting splits a program P 
at the first branching point B creating two subprograms Pt and P^. P+ and P^ 
each represent the program P when assuming that the branching condition at B 
is evaluated to true or false respectively. For example, Figure 2 depicts P^ and 
P^ when splitting the program in Figure 1 at the first branching point in Line 
3. Syntactically splitting a program might result in suboptimal splits [12] where 
one part of the split is easy-to-verify and the other remains hard-to-verify. To 
mitigate the problem of suboptimal splits, BUBAAK-SpLit implements a dynamic 
splitting strategy: (1) we first check if the given program (or split) is hard-to- 
verify by running a weak verifier, (2) if it is hard-to-verify we split the program 
and repeat the process on the generated splits, (3) if it is not hard-to-verify 
we record the result of the weak verifier and continue with the other splits (if 
any). We continue this process until a fixed number of hard-to-verify splits is 
generated or a splitting limit is reached. If the problem is solved during the 
splitting process, we report the results of the weak verifiers. 

Figure 1 provides an example of the splitting process. After splitting two 
times, BUBAAK-SpLit identifies two hard-to-verify splits which are then verified 
by two verifiers in parallel in the main verification phase. Existing static split- 
ting strategies for C programs [12] might stop after the first split, resulting in a 
suboptimal split (with little to no benefits for the verification process). 


Verification technology. BUBAAK-SpLit in SV-COMP 2024 utilizes verifiers 
based on forward and backward symbolic execution. 

(Forward) symbolic execution (SE) [14] systematically explores program's ex- 
ecutions from the initial location. Backward symbolic execution (BSE) [8] ex- 
plores executions that reach a given (error) location and it does so by analyzing 
the program backwards from the locations. We employ a variant of BSE with 


3 Our variant of control-flow splitting was mainly inspired by Mooly Sagiv’s invited 
talk "Scaling Formal Verification to Realistic Code with Applications to DeFi" at 
ETAPS 2023. Our implementation however splits C programs, not Solidity contracts. 
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1 int main(void) { // P+ 1 int main(void) { // P- 

2 int x = nondet(); 2 int x = nondet(); 
assume( x >= 1000 ); ;  assume( !(x >= 1000) ); 

4 abort(); 1 df( x <=10)... 

5 } 5 } 


Fig. 2. Result of splitting the program from Figure 1 at the first branching point. 


loop folding (BSELF) [8] which allows us to generate loop invariants and prove 
programs correct. 

SE can very quickly identify easy-to-verify problems, so we use it with a short 
timeout as the weak verifier during splitting. Strong verifiers in the main verifica- 
tion phase are selected based on the property. For the property unreach-call, we 
use BSELF and SE (with no timeout) in parallel - BSELF to prove programs 
correct and SE to (mainly) find bugs. Other properties are not supported by 
BSELF. For checking termination properties, we run SE and termination with 
inductive invariants with progress (TIIP) [7]. For checking memory safety, we 
use only SE. Note that the splitting phase is executed for all properties. 


2 Software architecture 


BUBAAK runs verification tools in a combination of sequential and parallel port- 
folio. The verifiers are not composed into a fixed scheme, but they are invoked 
dynamically based on the information gathered during the verification process. 
In a bit more detail, the architecture of BUBAAK is inspired by process alge- 
bras [4] and is centered about tasks and their rewriting. The tool starts with 
the execution of a set of initial tasks; upon finishing, each task either yields a 
result, or rewrites itself into a new task or a set of new tasks. Whenever a task 
rewrites itself into a set of new tasks, it also specifies how the results of the new 
tasks should be aggregated into a single result. The important feature is that 
generating new tasks is not fixed in a static scheme: a task can rewrite itself 
into new tasks based on the context and information hitherto gathered about 
the program during the verification process. 

What tasks are executed and how they are being rewritten is defined by a 
selected workflow. The workflow for splitting in SV-COMP 2024 is depicted in 
Figure 3. It defines the task Split(P) that takes program P and splits it into 
two parts as described in Section 1. This task is invoked as the initial task on 
the input program. After splitting the program, Split rewrites itself into two 
identical tasks CCAndCheckWeak that are invoked on those two splits. As the 
name suggests, the input split is compiled (into LLVM [1]) and the weak verifier 
is ran on it to check if the split is easy to solve. If a split is not easy to solve, 
the task Split is invoked on the split recursively, and this process continues until 
a pre-defined depth is reached, at which point instead of splitting further the 
workflow invokes the strong verifier. 
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> CCAndCheck Weak(P*) 
Split(P) : P+, P- :=split(P) >A 
> CCAndCheckWeak(P 7) 


CCAndCheckWeak(P): bc := compile(P) ———> CheckWeak(bc, P) 


time > 2s A depth < 2 
CheckWeak(bc, P) : bubaak-lee(bc) Split(bc, P) 


time > 2s ^ depth > 2 


> CheckStrong(bc) 


-——> bubaak-lee(bc) 


CheckStrong(bc) : — vV 


L> slowbeast-bself (bc) 


Fig. 3. The workflow of BUBAAK-SpLit for SV-COMP 2024. For brevity, the scheme 
does not show errors handling and the result propagation. 


Workflows are only an abstraction: internally, task execution and rewriting 
is implemented using an event loop that handles events coming from tasks, task 
creation and destruction, and the results aggregation. 

The weak verifier is based on BUBAAK-LEE and we run a combination of 
BUBAAK-LEE and SLOWBEAST during the main verification phase. BUBAAK 
and SLOWBEAST are implemented in Python, BUBAAK-LEE is in C++. Both 
SLOWBEAST and BUBAAK-LEE use Z3 [9] as the SMT solver. 


3 Strengths and Weaknesses 


Program splitting has been shown to improve the runtime efficiency [15] and 
verification effectiveness [11] of symbolic execution engines. By splitting the 
program into several parts, SE and BSE can analyze different parts of the pro- 
gram at the same time, which can lead to results being decided more quickly. 
In SV-COMP 2024 [6], BUBAAK-SpLit was able to solve 60 benchmarks that 
BUBAAK was not able to solve and 456 benchmarks were solved faster, often 
significantly. In comparison to BUBAAK, BUBAAK-SpLit misses several viola- 
tions on the ReachSafety benchmarks. Most of them are due to the fact that we 
severely limit the execution time of SE during verification. Another problem is 
the scalability of our approach in the restricted setting of the SV-COMP. By 
splitting the program up to n times, we currently run up to 2” verifiers at the 
same time. While in praxis this might significantly reduce the walltime, it also 
significantly reduces the cputime available to each verifier. Overall, BUBAAK- 
SpLit inherits the strengths of the underlying analyses which allows the tool 
to perform well in the categories ReachSafety and SoftwareSystems. After SV- 
COMP 2024, we have found and fixed several bugs in the implementation of 
BUBAAK-SpLit which might have severely limited its performance. 
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Open Access This chapter is licensed under the terms of the Creative Commons 
Attribution 4.0 International License (http://creativecommons.org/licenses/by /4.0/), 
which permits use, sharing, adaptation, distribution and reproduction in any medium 
or format, as long as you give appropriate credit to the original author(s) and the 
source, provide a link to the Creative Commons license and indicate if changes were 
made. 

The images or other third party material in this chapter are included in the 
chapter's Creative Commons license, unless indicated otherwise in a credit line to the 
material. If material is not included in the chapter's Creative Commons license and 
your intended use is not permitted by statutory regulation or exceeds the permitted 
use, you will need to obtain permission directly from the copyright holder. 
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Abstract. CPAcuecxer is a versatile framework for software verification, 
rooted in the established concept of configurable program analysis. Com- 
pared to the last published system description at SV-COMP 2015, the 
CPAcuecxer submission to SV-COMP 2024 incorporates new analyses for 
reachability safety, memory safety, termination, overflows, and data races. 
To combine forces of the available analyses in CPAcuecker and cover the 
full spectrum of the diverse program characteristics and specifications in 
the competition, we use strategy selection to predict a sequential portfolio 
of analyses that is suitable for a given verification task. The prediction 
is guided by a set of carefully picked program features. The sequential 
portfolios are composed based on expert knowledge and consist of bit- 
precise analyses using k-induction, data-flow analysis, SMT solving, Craig 
interpolation, lazy abstraction, and block-abstraction memoization. The 
synergy of various algorithms in CPAcuecker enables support for all prop- 
erties and categories of C programs in SV-COMP 2024 and contributes 
to its success in many categories. CPAcuzckzn also generates verification 
witnesses in the new YAML format. 


1 Software Architecture 


CPACHECKER [10] is a flexible framework for automatic software verification based 
on the concept of Configurable Program Analysis (CPA) [9]. Abstract domains 
needed by a verification approach are represented as CPAs, and multiple CPAs can 
be combined in a modular fashion to achieve synergy. CPACHECKER provides basic 
functionalities for program analysis, such as tracking the control flow or callstack, 
as standalone CPAs, which facilitate the implementation of new analyses. Through 
its modular architecture, a rich collection of verification algorithms [7, 12, 14, 24] 
has been implemented in CPAcHECKER, and its flexibility and extensibility have 
been evidenced by many research projects. 

Operating Platform. CPAcHECKER is platform-independent as it is written in 
Java. However, its default SMT solver MarHSAT5 [17] is bundled only for Linux. 
Thanks to the versatility of the used library JavaSMT [23], a different SMT solver 
can be chosen on other platforms. 
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Mem. Safety Symbolic exec. [13] + SMG-based analysis [20] 
No Data Race Val [14] + memory-access-based POR [25] 
No Overflow Reduction to reach. safety + PredAbs [22] 


Termination Liveness-as-safety [26]; lasso-based analysis [18] 


Property? 


Reach. Safety Recursion 


PredAbs + Val with BAM [21] 
Concurrency BDD-based analysis [8] 
Loop-free BMC [16]; PredAbs 


Single loop Symbolic exec. [13]; Val; PredAbs; DF [2]; IMC [12] 


Program 
Structure? 


Symbolic exec.; Val; PredAbs; DF; k-induction 


Non-int. data Val; k-induction [6] 


Other 


Fig. 1: Strategy selection based on the property to verify and program structure 
(New components since the last published system description [19] are marked in boldface. 
‘+ and ‘; denote component composition and sequential execution, respectively.) 


Witnesses. CPACHECKER produces correctness and violation witnesses for all 
properties where the corresponding witness type is already defined by the com- 
munity. These are exported in the established GraphML format [4,5] as well as 
in the new YAML format that is introduced with SV-COMP 2024. 


2 Verification Approaches 


To effectively solve the verification tasks from the heterogeneous benchmark 
set used in the competition, we need different verification strategies. Given a 
verification task, we select a suitable strategy with a two-level approach according 
to the property of the task and the structure of the program. A strategy could be a 
sequential portfolio of different verification techniques, each of which is assigned a 
time limit that is determined with expert knowledge. Figure 1 shows the selection 
procedure. The first-level selection is based on the property of the verification 
task. If the property is among memory safety, no-dataraces, no-overflows, or 
termination, a dedicated strategy is immediately assigned to solve the task. If the 
property is reachability safety, we further distinguish the program structure of a 
task into six classes by a set of carefully picked features, and a tailored strategy 
is invoked for each class. The details for each property and program structure are 
given below. 

Memory Safety. Memory safety is checked by an unbounded analysis based 
on symbolic memory graphs (SMGs) [20]. It utilizes symbolic execution [13] to 
reason over non-concrete values, enabling us to verify the safety of low-level 
memory operations. The graph-based approach allows us to not only represent 
heap memory efficiently, but also to abstract linked memory structures (e.g., 
linked lists) that are created with low-level memory operations. 

No Data Race. Data races are checked with a combination of value analysis 
(Val) [14], the thread handling from our concurrency analysis [8], and a CPA that 
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tracks read and write accesses to memory locations. We perform partial order 
reduction (POR) [25] over thread-local memory accesses to improve performance. 
No Overflow. Overflows are checked with a CPA that adds additional constraints 
for overflow detection and a bit-accurate predicate abstraction (PredAbs) [7]. 
For recursive tasks we add block-abstraction memoization (BAM) [21,27], which 
summarizes the input-output behavior of recursive functions. 

Termination. Our termination strategy consists of two techniques. The first 
technique transforms liveness to a safety property [26]. With a combination of 
predicate and value analyses we check whether there exists some program state at 
a loop head that can be visited twice. If the program is recursive or the analysis 
reaches a time limit of 300s, we switch to the second techniques, which uses ideas 
first implemented in TERMINATOR [18]: We apply CPACHECKER’s predicate-based 
reachability analysis to detect potentially non-terminating program executions, 
called candidate lassos. A lasso consists of a stem (a finite program path) that is 
followed by a loop (a finite program path that describes a syntactic cycle in the 
program). Found candidate lassos are analyzed with the library LassoRANKER [24] 
to synthesize termination and non-termination arguments. If a non-termination 
argument is found for at least one candidate lasso, violation of the termination 
property is reported. Otherwise, the analysis claims the program as terminating. 
Reachability Safety. For the reachability of an error location, we tailor our 
verification strategy based on the structure of the program. If the program 
contains a recursive function, we apply block-abstraction memoization [21,27] in 
combination with value analysis (Val) and predicate abstraction (PredAbs). If the 
program is multi-threaded, a concurrency analysis [8] that relies on binary decision 
diagrams (BDD) is applied. We set an upper limit of five threads for the analysis, 
and if this threshold is surpassed, the analysis is aborted. For non-recursive and 
single-threaded programs, we assign one of the four verification strategies in Fig. 1 
according to the following structural features: the number of loops and whether the 
program contains non-integer data types, such as floating-point variables, arrays, 
or composite data structures [3]. The four strategies are all based on sequential 
combinations [19] of various bit-precise analyses with different time limits. For 
loop-free programs, we apply bounded model checking (BMC) [16] with a fallback 
to PredAbs [22]. For programs with a single loop, we apply a sequence of symbolic 
execution [13], Val [14], PredAbs [11], interval-based data-flow analysis (DF) [2], 
and interpolation-based model checking (IMC) [12]. For programs with multiple 
loops and non-integer data types, we apply Val and k-induction [6]. For all other 
programs, i.e., those with multiple loops but without non-integer data types, 
we apply a sequential portfolio of symbolic execution, Val, PredAbs, DF, and 
k-induction. 


3 Strengths and Weaknesses 


CPACHECKER with strategy selection performed well in SV-COMP 2024 [1], 
winning the second place in category Overall and the first place in category 
FalsificationOverall. Notably, it produced 17968 correct and confirmed results, 
more than any other participant, and outperformed the winner in category Overall 
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by 3296. CPAcHECKER is also robust: More than 96 96 of its correct results were 
confirmed by witness validators, and it produced only 17 wrong results (0.06 96 of 
all tasks). 

CPACHECKER won the third place in category ReachSafety by using various 
analyses orchestrated by strategy selection. For programs with non-integer data 
types, k-induction was the most effective analysis. In programs with loops, most 
alarms were found by symbolic execution, and most proofs were delivered by value 
analysis and predicate abstraction.! 

The only categories without a medal for CPACHECKER were Termination, Con- 
currencySafety, and MemSafety. In particular, all wrong results in the category 
MemSafety are due to imprecise abstractions of nested lists. To alleviate them, we 
intend to improve the precision of our list abstraction and incorporate SMT-based 
array abstraction, which would make CPAcHECKER more effective in this category. 
'To improve the termination analysis, we plan to make the analyses more cooper- 
ative and carry over partial proofs in the sequential combination. Additionally, 
CPACHECKER needs improvements for finding invariants with quantifiers, which 
mainly affects verification tasks with large arrays. 


4 Setup and Configuration 


SV-COMP 2024 ran CPAcHECKER version 2.3 [15] on all categories with C pro- 
grams. It runs on a standard GNU/Linux system with a Java 17 compatible 
runtime environment. To start CPACHECKER, execute the following command: 


Scripts/cpa.sh -svcomp24 -benchmark -heap 10000M -timelimit 900s 
-spec property.prp program.i 


For programs assuming a 64-bit memory model, append the argument -64 to the 
command line. At the end of the execution, the verification result is printed to 
the console output and the witnesses are written to the files witness.graphml 
and witness.yml in the directory output/. 

Note that the configuration -svcomp24 is optimized specifically for the resource 
limits used in SV-COMP (15 GB of RAM and 15min CPU time per task). For 
other use cases (e.g., with less RAM or a different time limit), please apply a 
different configuration (e.g., default) and adjust the memory consumption with 
the command-line option -heap as described in the documentation. 


5 Project and Contributors 


More than 100 developers have contributed to CPACHECKER, mainly from LMU 
Munich, TU Darmstadt, U Paderborn, U Passau, TU Prague, U Oldenburg, TU 
Vienna, ISP RAS, and several other universities and institutes. We would like to 
thank all contributors for their investment in CPAcHEcKER. A complete list and 
more information about the project is available at https://cpachecker.sosy-lab. 
org. A list of bugs that CPAcHEcKER found in the Linux kernel is also available. 


1 Note that the observations are specific to our sequential portfolios and influenced by 
the orders of analyses in the combination. 
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Data- Availability Statement. The tool is available at https://cpachecker. 
sosy-lab.org and the version used in SV-COMP 2024 is archived at Zenodo [15]. 
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Abstract. We submit to SV-COMP 2024 CPV, a circuit-based software 
verifier for C programs. CPV utilizes sequential circuits as its intermedi- 
ate representation and invokes hardware model checkers to analyze the 
reachability safety of C programs. As the frontend, it uses Knaros2, a re- 
cently proposed verification tool, to translate a C program to a sequential 
circuit. As the backend, state-of-the-art hardware model checkers ABC 
and AVR are employed to verify the translated circuits. We configure the 
hardware model checkers to run various analyses, including IC3/PDR, 
interpolation-based model checking, and k-induction. Information discov- 
ered by hardware model checkers is represented as verification witnesses. 
In the competition, CPV achieved comparable performance against partici- 
pants whose intermediate representations are based on control-flow graphs. 
In the category ReachSafety, it outperformed several mature software veri- 
fiers as a first-year participant. CPV manifests the feasibility of sequential 
circuits as an alternative intermediate representation for program analysis 
and enables head-to-head algorithmic comparison between hardware and 
software verification. 


Keywords: Software verification - Hardware verification - C programs - 
Sequential circuits - Bror2 - AIGER - Tool combination - Portfolio 


1 Introduction 


Software verification is challenging. Numerous intermediate representations have 
been proposed to capture diverse software features and facilitate the development 
of program verifiers. Among various encodings of a state-transition system, sequen- 
tial circuits, consisting of memory elements to represent states and combinational 
logic to capture state transitions, are commonly used in the hardware-verification 
domain, and abundant techniques have been invented for hardware model checking. 
Using sequential circuits as its intermediate representation, our tool CPV aims to 
answer the following question: Are sequential circuits feasible as an alternative 
foundation to build software verifiers? While previous studies on translating and 
cross-applying verification techniques for hardware and software exist [1, 2,3, 4], 
to our knowledge, no participants in SV-COMP had used sequential circuits as 
their intermediate representations. This competition report outlines the software 
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Fig. 1: Software architecture of CPV 


architecture and verification approach of CPV and discusses its results against 
other mature program analyzers in SV-COMP 2024 [5]. 


2 Software Architecture 


The software architecture of CPV is depicted in Fig. 1. Its verification workflow 
is divided into two stages: (1) In the frontend (the upper half of Fig. 1), an 
input C program with a reachability-safety property is first instrumented to 
allow for witness translation (details in Sect. 3) and then translated into a word- 
level BTOR2 [7] circuit by Knaros2 [6]. The BTOR2 language [7] is used in the 
Hardware Model Checking Competitions [14, 15], and many powerful hardware 
model checkers support this format. A bit-level AIGER [9] circuit is also generated 
by the tool Bron2A1GER [8]; (2) In the backend (the lower half of Fig. 1), CPV 
invokes hardware model checkers AVR [10] and ABC [11] to verify the translated 
circuits. BTOR2 verification witnesses produced for the circuits are translated to 
software witnesses in the GraphML format [13] for the original program. CPV 
configures and executes the backend model checkers (either solely or as portfolios) 
via COVERITEAM [12], a library for cooperative verification [16]. Thanks to the 
versatility of CoVERITEAM, it is convenient to choose the verification algorithms 
used by AVR and ABC, and the pool of the backend verifiers in CPV can be 
expanded with little effort. 


3 Verification Approach 


'The approach of CPV is to translate a program into a circuit and applies hardware 
model checking to the translated verification task. To generate software-verification 
witnesses, CPV instruments an input program before translating it to a circuit, 
such that the information contained in a witness for the translated circuit can 
be mapped back to the original program. 


Program-to-Circuit Translation. CPV utilizes KRATos2 [6] as its frontend to 
translate a verification task of a C program into a word-level sequential circuit in 
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the BTOR2 format [7]. KraTos2 applies large-block encoding |17] and introduces a 
symbolic program counter to fold the summarized program into a state-transition 
system. Executing a maximal loop-free block of the program is a one-step transition 
in the system. A call to an external function that models nondeterministic input 
values to the program, e.g., the functions __VERIFIER_nondet_X() in SV-COMP, 
is represented as an external input to the state-transition system. We configure 
KRATOs2 to export the system as a sequential circuit in the BTOR2 format because 
BTOR2 is the prevailing format for hardware model checking. In order to leverage 
bit-level hardware model checkers, CPV additionally invokes Bror2AIcEr [8] to 
translate the word-level BTOR2 circuit into the AIGER format [9]. Currently, CPV 
supports the property of reachability safety. Violation to the reachability-safety 
property of the input program is captured by a circuit output asserting the 
equivalence between the symbolic program counter and the error location. 


Hardware Model Checking. CPV employs AVR [10] and ABC [11], two 
state-of-the-art hardware model checkers for word-level BTOR2 and bit-level 
AIGER circuits, respectively, to analyze the translated circuits. A hardware model 
checker decides whether the translated circuit has a computation trace to assert 
its circuit output, which indicates the error location in the original program is 
reachable. In this case, the verification verdict is false, and the original program 
is unsafe. If there is no trace to assert the circuit output, the verdict is true, 
and the original program is safe. 

To achieve synergy, we combine the strengths of various hardware-verification 
algorithms, including property-directed reachability (PDR) [18, 19], interpolation- 
based model checking (IMC) [20], k-induction (KI) [21], and bounded model 
checking (BMC) [22]. For the tasks that can be translated into AIGER circuits,! 
a sequential portfolio of AVR-KI, AVR-PDR, ABC-IMC, ABC-PDR, and AVR- 
BMC is applied. A pre-determined time limit is imposed on each component in 
the portfolio by CoVERITEAM. AVR is executed first in the portfolio because it 
can produce a BTOR2 witness [7] for the translated circuit if a property violation 
was found, whereas ABC does not export witnesses in a standardized format. 
CPV can then translate a BTOR2 witness back to a software violation witness. 
Currently, CPV outputs a dummy violation witness if a bug is reported by 
ABC. Since both the BTOR2 and AIGER languages do not define a format for 
correctness witnesses, CPV also outputs a dummy correctness witness in this 
case. For the remaining tasks that cannot be translated into AIGER circuits, CPV 
uses a sequential portfolio of AVR's KI, PDR, and BMC. 


Program Instrumentation for Witness Translation. To map the information 
in a BTOR2 witness back to the original program, CPV instruments the input 
program prior to the program-to-circuit translation. A BTOR2 violation witness 
encodes a computation trace that asserts the output of the translated circuit. The 
trace consists of a sequence of values given to the circuit's external inputs, each 
corresponding to a call to a function _.VERIFIER_nondet_X() in the program. 


! The Bror2-to-AIGER translation may fail if a BTOR2 circuit uses data sorts or oper- 
ations unsupported by AIGER, such as arrays or non-constant register initialization. 
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Table 1: Summary of CPV’s correct results in SV-COMP 2024 


ReachSafety ##tasks solved by respective approach 


verdict #tasks | 784 avR-KI avR-PDR AVR-BMC ABO-IMC 
true 8323 | 3860 3405 323 0 132 
false 2899 | 1092 867 172 2 51 


To assume these values at the control-flow locations where they are relevant for 
triggering the property violation, CPV’s instrumentor assigns a fresh counter 
to each of these calls. A counter is incremented after each call, so its value can 
be inferred from the BTOR2 witness. An input value is relevant if accompanied 
by a change in its counter. The witness translator of CPV traverses the BTOR2 
witness, extracts the relevant input values by tracking the changes in the counters, 
and exports the software violation witness in the GraphML format [13]. 


4 Results in SV-COMP 2024 


CPV participated in the category ReachSafety of SV-COMP 2024 [5]. As a first- 
year participant, it surprisingly outperformed several mature software verifiers 
in terms of the number of correctly solved tasks. CPV is especially effective in 
the subcategory ReachSafety-Hardware and ReachSafety-ECA, solving the second 
and third most tasks among all participants, respectively. Its impressive results 
manifest the feasibility of using sequential circuits as an alternative intermediate 
representation to construct program verifiers. 

The overall results of CPV is summarized in Table 1. Among the 11 222 veri- 
fication tasks in the category ReachSafety, 8439 were successfully translated to 
BTOR2 circuits by Kraros2, and 7773 could be further translated to AIGER 
circuits by Bror2Aicsr. In total, CPV produced 4952 correct and confirmed 
results. The k-induction implementation in AVR contributed the most correctly 
solved and confirmed tasks, followed by PDR of AVR and IMC of ABC 

We will improve CPV in the following directions: First, we will generate 
non-trivial software correctness witnesses through extracting and translating 
the fixed points computed by hardware model checkers. We aim to enhance the 
witness-confirmation rate of CPV, currently about 90%, to the level of other 
mature participants (more than 95%). Second, we will investigate the 27 false 
alarms in the subcategory ReachSafety-Hardness. 


5 Setup and Configuration 


We submitted CPV at version 0.4 [23] to SV-COMP 2024 [5]. A Linux-based oper- 
ating system is required to execute the tool, as the used library CoVERITEAM [12] 
relies on Linux-specific features, such as control groups, name spaces, and over- 
lay file systems. Additional Python package requirement and the instructions 
to set up the execution environment can be found in the README file of the 
submitted tool archive. 


2 The observations are specific to the order of algorithms in CPV’s sequential portfolios. 
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Data- Availability Statement. CPV is an open-source project, developed and 
maintained by the Software and Computational Systems Lab at LMU Munich. 
Its source code and executables are archived on Zenodo [23], and the project is 
maintained on GitLab at https://gitlab.com/sosy-lab/software/cpv. 


Funding Statement. This project was funded in part by the Deutsche Forschungs- 
gemeinschaft (DFG) — 378803395 (ConVeY) and the LMU Postdoc Support Fund. 
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Abstract. THETA is a model checking framework conventionally based 
on abstraction refinement techniques. While abstraction is useful for a 
large number of verification problems, the over-reliance on the technique 
led to THETA being unable to meaningfully adapt. Identifying this prob- 
lem in previous years of SV-COMP has led us to create EMERGENTHETA, 
a sandbox for the new approaches we want THETA to support. By differ- 
entiating between mature and emerging techniques, we can experiment 
more freely without hurting the reliability of the overall framework. In 
this paper we detail the development route to EMERGENTHETA, and its 
first debut on SV-COMP’24 in the ReachSafety category. 


Funding. This research was partially funded by the UNKP-23-{2,3}-I New National 
Excellence Program; Project no. 2019-1.3.1-KK-2019-00004 (implemented with the 
support provided from the NRDI Fund of Hungary under the 2019-1.3.1-KK fund- 
ing scheme); and the Doctoral Excellence Fellowship Programme (funded by the NRDI 
Fund of Hungary and the BME University). 


1 Software Architecture 


THETA is a modular and configurable verification framework in the sense that 
multiple frontend subprojects are served by a vastly configurable, CEGAR-based 
backend ({10J6]). Frontends include Petri-nets, AIGER models, timed automata, 
and C programs among others (hence the modularity), and the CEGAR backend 
provides fine-grained access to its internal settings such as refinement and search 
strategy, abstract domains, and solver selection (hence the configurability). It is, 
however, not conventionally capable of using non-CEGAR based analyses. This 
behavior is engrained in the implementation in multiple ways, such as coun- 
terexamples and safety proofs requiring a partial or full abstract reachability 
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graph, and the interface of the backend containing references to precision [6]. 
Our main contribution as part of SV-COMP’24 is the removal of such dependen- 
cies on abstraction-specific classes. This enables the rapid prototyping and de- 
velopment of diverse verification algorithms such as this year’s BMC, IMC, and 
k-induction algorithms [5{7]9], building the low-level core of THETA including 
the representation and manipulation of expressions and interfacing with several 
SMT-solvers. 


To facilitate the implementation of these algorithms, we introduced a new 
MonolithicTransitionFunction interface to Theta, which returns a single non- 
deterministic action representing the whole transition system (i.e., it represents 
the structural information as additional variables and related guards). This is 
a counterpart to the previously existing TransitionFunction interface, which di- 
rectly relies on the structural information for the enabledness of actions. This 
interface has been implemented for most of the formalisms supported by THETA. 


Besides the changes detailed above, EMERGENTHETA still relies on THETA’s 
ANTLR-based C frontend and integrated support for SMT-solvers, as well as its 
existing counterexample-to-witness projection [1]. 


2 Verification Approach 


In bounded model checking (BMC) [5], the transition system and the safety prop- 
erty are encoded as SMT [3] formulas. In each iteration of the algorithm, a path 
constraint is created from the formulas characterizing all execution paths of a 
given length k that start in an initial state and end in an error state. The satis- 
fiability of the path constraint is checked using an SMT solver [8]. If a satisfying 
assignment is found, it is returned as a counterexample, else the bound k is 
increased until the available resources allow. 


BMC is incomplete as it can only prove the absence of counterexamples 
up to a finite depth. K-induction [9] and interpolation-based model checking 
(IMC) [7] address this by adding checks that attempt to prove that the property 
holds for unbounded depth based on the unsatisfiability of the BMC query. K- 
induction does so by trying to prove the k-inductivity of the property with k 
being the current BMC length, while IMC derives Craig interpolants to compute 
an overapproximation of the set of reachable states. 


Based on preliminary testing, we used a simple sequential portfolio (without 
algorithm selection) that executed an IMC-only verification phase first (for at 
most 90 seconds), then fell back to a combined BMC and k-induction-based 
verification phase for the rest of the time limit. EMERGEN'THETA did not employ 
any of the CEGAR-based analysis methods already present it T'HETA, as we 
wanted to evaluate the newly implemented ones separately. 
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All False True 
Tool EmergenTheta Theta EmergenTheta Theta EmergenTheta Theta 
Category 
Arrays 13 (7) 13 (7) 0 (0) 5 (5) 13 (7) 8 (2) 
Bit Vectors 16 (1) 23 (8) 7 (0) 10 (3) 9 (1) 13 (5) 
Combinations 1 (0) 138 (137) 1 (0) 121 (120) 0 (0) 17 (17) 
ControlFlow 7 (4) 9 (6) 1 (0) 2 (1) 6 (4) 7 (5) 
ECA 1(1) 307 (307) 0 (0) 133 (133) 1 (1) 174 (174) 
Floats 25 (6) 54 (35) 2 (0) 23 (21) 23 (6) 31 (14) 
Hardness 378 (269) 116 (7) 0 (0) 0 (0) 378 (269) 116 (7) 
Hardware 134 (15) 194 (75) 60 (10) 89 (39) 74 (5) 105 (36) 
Heap 2 (0) 2 (0) 0 (0) 0 (0) 2 (0) 2 (0) 
Loops 232 (117) 161 (46) 34 (11) 40 (17) 198 (106) 121 (29) 
Sequentialized 1 (0) 47 (46) 1 (0) 34 (33) 0 (0) 13 (13) 
XCSP 2 (0) 45 (43) 2 (0) 44 (42) 0 (0) 1 (1) 
Overall 812 (420) 1153 (761) 108 (21) 530 (443) 704 (399) 623 (318) 


Table 1: Comparison of THETA and EMERGENTHETA for each subcategory 


3 Discussion of Strengths and Weaknesses of the 
Approach 


As our secondary goal (besides adapting THETA's architecture to a more flexible 
one) was to find out how the new algorithms implemented in EMERGENTHETA 
performed, we mainly compare and contrast the results of EMERGEN'THETA 
(which used only the newly implemented algorithms) and THETA (which used 
only CEGAR). In the future, we aim to integrate the new algorithms into our 
mainline T'HETA tool, for which this evaluation is invaluable. 

[Table 1|compares the number of tasks correctly solved by THETA and EMER- 
GENTHETA for each subcategory in REACHSAFETY (using official results) [4], dis- 
tinguishing between true and false outputs. The numbers in parentheses show 
the number of correctly solved tasks that the other tool was unable to solve in 
time. 

Looking at the overall results, we can see that THETA and EMERGENTHETA 
are suitable for different tasks: although Theta solved more tasks, EMERGEN- 
'THETA solved 420 tasks that 'THETA could not solve, which is 3696 of the 1153 
tasks solved by THETA. With an ideal portfolio, incorporating these algorithms 
could significantly increase the number of tasks solved by THETA. 

THETA was much better at finding counterexamples (108 vs 530 false out- 
puts), while EMERGENTHETA was slightly better at proving correctness (704 
vs 623 true outputs). This goes against our intuition, as abstraction refinement 
is more tailored to proving correctness. This phenomenon warrants further in- 
vestigation; our current hypothesis is that performing enough refinements to 
eliminate all spurious counterexamples had too large an overhead. More than 
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half of the true results for each tool were for tasks that the other one could not 
solve, highlighting their complementary nature. 

EMERGENTHETA was significantly better in the Loops and the Hardness 
categories, while it was worse in Combinations, ECA, Sequentialized and XCSP. 
As for Combinations and Sequentialized, this could be attributed to THETA being 
generally better at finding counterexamples, as false tasks are overrepresented 
in these categories; but for ECA and XCSP, tasks of both types are represented 
nearly equally. 

These relatively positive results were achieved in spite of a misconfiguration: 
although our preliminary measurements had shown that CVC5 and MATH- 
SAT performed best with K-IND and IMC respectively, we accidentally en- 
rolled EMERGENTHETA with its default solver Z3. We consider this a failure in 
the design of the portfolio engine of THETA, which allowed us to submit a faulty 
configuration without this being evident in the logs (that no runs were using 
solvers other than Z3). We will prioritize improving on this aspect of THETA for 
next year. 


4 Tool Setup and Configuration 


EMERGENTHETA remains vastly configurable, and successfully choosing a per- 
formant configuration for a verification task at hand can be complicated. If using 
the competition archive [2] for software verification, we recommend using the pre- 
assembled portfolio: theta-start.sh «input» --backend IMC THEN KIND. To 
minimize the output verbosity and produce a witness in the working directory, 
the flags --loglevel RESULT and --witness-only can be added to the argu- 
ments. We also used these options at SV-COMP 2024. 


5 Software Project and Contributors 


EMERGENTHETA is integrated into the THETA verification framework main- 
tained by the Critical Systems Research Grougl!] of the Budapest University 
Technology and Economics. The project is available open-source on GitHuH? 
under an Apache 2.0 license. The version (5.0.0) used in the competition is 
available at [2]. 
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Abstract. ESBMC implements many state-of-the-art techniques that 
combine abstract interpretation and model checking. Here, we report 
on new and improved features that allow us to obtain verification re- 
sults for previously unsupported programs and properties. ESBMC now 
employs a new static interval analysis of expressions in programs to in- 
crease verification performance. This includes interval-based reasoning 
over booleans and integers, and forward-backward contractors. Other 
relevant improvements concern the verification of concurrent programs, 
as well as several operational models, internal ones, and also those of 
libraries such as pthread and the C mathematics library. An extended 
memory safety analysis now allows tracking of memory leaks that are 
considered still reachable. 


1 Software Architecture 


ESBMC is a mature, permissively licensed open-source context-bounded 
model checker for the verification of single- and multi-threaded C programs for 
various code safety violations (e.g., buffer overflows, dangling pointers, arith- 
metic overflows) and user-defined assertions. It has been successfully participat- 
ing in the SV-COMP competitions for many years due to our continuous work 
towards improving its performance. ESBMC transforms a given C program using 
a Clang-based [LI] front-end into an intermediate representation in the GOTO 
language [3], which is symbolically executed to produce verification formulae 
passed to one or more SMT solvers. In addition, ESBMC implements state-of- 
the-art incremental BMC and k-induction proof-rule algorithms based on SMT 


and Constraint Programming (CP) solvers. 
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2 Verification Approach 


Interval analysis In this year, ESBMC interval analysis was improved using Ab- 
stract Interpretation techniques [5]. We used the integer domain (with infinities) 
as the abstract domain for SV-COMP. The domain consists of, for each state- 
ment in the program, keeping the box interval (i.e., a minimum and maximum) 
for all variables. ESBMC also supports interval arithmetic and widening strate- 
gies (through extra- and interpolation). Once computed, the intervals are used 
for optimizations (i.e., dead code elimination and constant folding) and invariant 
instrumentation. 

Regarding the new code instrumentation, the main use of intervals is to 
generate invariants which the k-induction strategy benefits from most. This is 
done by adding assumptions restricting the value of variables. In addition, the 
set of variables used for these assumptions has been reduced to those occurring 
in conditional statements and guards only. Lastly, we expanded the types of 
instrumented statements: assertions, conditionals, and function calls. 


Contractors ESBMC v7.4 employs another method to refine intervals based 
on contractors. Contractors are commonly used in the context of Con- 
straint Satisfaction Problems (CSPs), that is, when variables, their (real-valued) 
domains, and constraints over those variables are fixed. A contractor is an op- 
eration on n-dimensional boxes (product of intervals) respecting the given con- 
straints, i.e., it refines the domains such that no solutions to the CSP are lost. A 
particularly efficient one for CSPs containing a single constraint is the Forward- 
backward contractor mu It operates in two stages: forward evaluation and 
backward propagation . In scenarios with multiple constraints, the forward- 
backward contractor is applied to each constraint independently. 

ESBMC utilizes the forward-backward contractor implemented in the Ibex 
library B] to refine the results of the interval analysis mentioned above. That is, 
conditions of statements such as “if” and loops in the program are relaxed to 
conditions over reals, where possible, and then the contractor is applied to this 
relaxed condition. The result is a refined set of intervals for the variables involved. 
These refined intervals are then restricted to the original variable domains, which 
— in case of, e.g., integers — results in a further reduction of the size of intervals. 
'The intervals contracted in this way generally enhance the results of the interval 
analysis employed by ESBMC and benefit its k-induction strategy. 


Memory leaks This year, ESBMC employs a refined check for the valid-memtrack 
property. This property is loosely described as only allowing those dynamically 
allocated objects to survive that are still reachable at the end of the program's 
execution by following a path of pointers stored in objects eventually referenced 
by global variables. A property violation witness has to contain proof of unreach- 
ability of a dynamic allocation starting from any global variable. 

The new algorithm leverages the existing one tracking the lifetime of al- 
locations for the valid-memcleanup property, but it specifically excludes still- 
reachable objects from the check. This condition is encoded into an SMT formula 
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using the paths deterministically described by expressions of type struct, union, 
pointer, or array with constant size. Each possible successor along the path is 
obtained through the value-set, and the validity is encoded through guards which 
have to hold at the end of execution. 


C mathematical library ESBMC v7.4 offers extended support for the math.h 
library. Accurate modeling of its semantics is crucial for reasoning on the behav- 
ior of complex floating-point software. For example, most neural network code 
relies on 32-bit floats and may invoke the math.h library to compute the result 
of activation functions, positional encodings, and vector normalisations [12]. 

The IEEE 754 standard |9| mandates bit-precise semantics for a small subset 
of the math.h library only. This subset includes addition, multiplication, division, 
sqrt, fma, and other support functions such as remquo. In contrast, the behavior 
of most transcendental functions (e.g., sin, exp, log) is platform-specific. Still, 
the standard recommends implementing the correct rounding whenever possible. 

As a tradeoff between precision and verification speed, ESBMC now features 
a two-pronged design. For the most commonly-used float functions, we bor- 
row the MUSL plain-C implementation of numerical algorithms (13]. For the 
corresponding double functions, we employ less complex algorithms with ap- 
proximate behavior. 


Data races Data races occur when multiple threads concurrently access the same 
memory location, and at least one of these accesses involves a write operation. 
ESBMC’s algorithm for checking data races extends the static code instrumen- 
tation CBMC [3] uses. The idea is to add a flag A’, initially true, to each variable 
A involved in an assignment. Directly after the assignment to A, A’ is reset to 
false. To identify races, we assert that the value of A' is false when A is ac- 
cessed. Subsequently, we outline the challenges encountered by ESBMC and the 
improvements we have implemented. 

As this method introduces additional instructions into the program, the 
potentially larger number of thread interleavings is counteracted by inserting 
atomic blocks appropriately — subject to ensuring accuracy, the atomic block 
encompasses the assertion on A’, original assignment to A, and setting A’, in 
sequence. Data races are now also checked on access of arrays with non-constant 
indices. The most challenging aspect of data race detection is the dereference 
of pointers, as the pointer would have to be instrumented but is not statically 
known through the value-set analysis. Thus, the new implementation is hybrid, 
addressing cases unsuitable for static analysis during symbolic execution, thereby 
enabling ESBMC to detect more types of data races. 


3 Strengths and Weaknesses 


The interval analysis improved and provided better invariants for ESBMC. The 
new optimizations help ESBMC to solve new benchmarks in categories with 
multiple path conditions (i.e., ECA). The main weakness of the method is that 
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our Abstract Interpreter only has partial support for widening, and it is not 
context-aware (i.e., function parameters and global variables cannot be tracked 
globally). This results in a slowdown for categories with loops with thousands 
of statements (e.g., Hardware). 

While contractors are highly regarded for their ability to provide assured 
limits on solutions, their cautious approach may lead to overly broad results and 
less precise conclusions. Therefore, a more rigorous evaluation of contractors is 
essential to assess their advantages and limitations effectively. 

The new algorithm for the valid-memtrack sub-property allowed ESBMC to 
identify 70/153 violations correctly with no incorrect verdicts (last year: 0/134). 
There is a theoretical weakness in the current implementation concerning dy- 
namic allocations only reachable through pointers stored in arrays of statically 
unknown size. It could result in incorrect-false verdicts, but it has not been 
observed in test cases, yet. 

Without operational models of the math.h library, ESBMC would assign non- 
deterministic results, which may cause incorrect counterexamples to be returned. 
This behavior is especially evident for older versions of ESBMC on neural net- 
work code [La]. as it usually contains many mathematical operations. ESBMC 
v7.4 fixes this semantic issue by providing explicit operational models for many 
common functions in math.h, thus yielding no incorrect results on the bench- 
marks in [12]. and achieving second place in the ReachSafety-Floats sub-category. 

From the competition results, the data race detection of ESBMC v7.4 is 
promising. Compared to the previous version, the new algorithm supports more 
types of expressions and reduces the verification time. The relatively high number 
of 2.296 incorrect-true verdicts is mostly due to still missing support for detecting 
data races during dereferences of pointers to compound types. 

We will address the weaknesses identified in this competition in the future. 


4 "Tool Setup and Configuration 


To setup and run ESBMC, follow the instructions in the README. md file. ESBMC 
can also be run via the Python wrapper esbmc-wrapper.py for simplified usage 
in the competition. An example command line is: 

esbmc-wrapper.py -s kinduction -a 64 -p unreach-call.prp example.c 


5 Software Project 


The ESBMC development is funded by ARM, EPSRC EP/T026995/1, EPSRC 
EP/V000497/1, Ethereum Foundation, EU H2020 ELEGANT 957286, UKRI 
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Abstract. GOBLINT is an abstract interpreter of C programs, focusing 
on the analysis of multi-threaded code. It is equipped with a variety of 
abstract domains, as well as analyses which allow it to reason about an 
array of program properties in a highly configurable manner. GOBLINT 
has been extended with support for the detection of memory safety bugs 
and non-termination. 


1 Verification Approach 


GOBLINT is an abstract-interpretation-based static analyzer of C code, with 
an emphasis on the sound analysis of multi-threaded programs [14, 15]. It uses 
side-effecting constraint systems [2] to combine context-sensitive analysis of local 
states with flow-insensitive analysis of data possibly shared between threads. 
GOBLINT is equipped with a range of different analyses that, in turn, build on 
multiple abstract domains for expressing candidate program invariants. 


1.1 Memory Safety 


Techniques for detecting memory-related bugs have been extensively studied [6, 
9, 10, 20]. While GOBLINT did not target such bugs in the past, new analyses 
for the sound analysis of memory safety have been added for SV-COMP 2024. 
The analyzer already tracks abstract address sets for pointer variables. A single 
abstract address consists of a variable and an abstract offset. The analyzer distin- 
guishes between regular program variables and allocated memory blocks, which 
are identified by their respective allocation sites together with the allocating 
thread and possibly an allocation counter. 
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The new analyses are concerned with the detection of the following memory- 
safety bugs: invalid memory deallocations, invalid pointer dereferences, as well 
as memory leaks. Beyond null-pointer dereferences, two further kinds of invalid 
dereferences are now considered: memory out-of-bounds accesses and use-after- 
free (UAF) bugs. Memory out-of-bounds accesses can be uncovered by obtaining 
the size, as well as the offset from the base address of the memory being accessed. 
To determine whether an access via some offset may be out of bounds, the anal- 
ysis relies on an expressive combination of integer domains including intervals. 

Invalid dereferences due to use-after-frees can be detected in the single- 
and multi-threaded case. For the single-threaded setting, the analysis uses the 
allocation-site abstractions in order to keep track of potentially already deal- 
located memory, and warns on accesses to such memory. Regarding the multi- 
threaded case, it additionally leverages GOBLINT’s side-effecting functionality 
by maintaining a global invariant that, for each piece of deallocated memory, 
collects the set of all threads that may free it. GOBLINT tracks abstract thread 
IDs which allow reasoning about which threads may run in parallel [17]. The 
may-happen-in-parallel (MHP) information from the abstract thread ID domain 
(and a dedicated analysis of thread joins) is used to infer whether an access to 
a piece of memory may happen in parallel with (or after) the deallocation of 
the same piece of memory by another thread. In addition, invalid frees due to 
possibly occurring double frees are flagged by this analysis as well. 

Potential memory leaks can be detected thanks to a dedicated analysis. To 
this end, all allocated memory blocks are tracked path- and context-sensitively. 
Furthermore, the allocation counter is relied on to potentially exclude memory 
leaks for a particular allocation site. Calls to deallocating functions, such as free, 
have the effect of removing pieces of tracked (and now deallocated memory) from 
the state, whenever the analysis determines that the passed pointer must point to 
an abstract block of memory which describes a single concrete memory location. 
At all exit points of the program, it is then checked whether the set of possibly 
still allocated memory is empty. In case any such set is non-empty, a memory 
leak is reported. In the multi-threaded case, the analysis checks the following 
stronger property and warns whenever that property may be violated: 


1. all threads have terminated at the end point of main, and 
2. exit and similar functions, causing early termination, are not called, and 
3. at its return, each thread has freed all the memory it allocated. 


This property allows for a thread-modular analysis, where sets of allocated and 
freed memory are maintained in a flow- and context-sensitive manner. 

We remark that the analysis for memory leaks tracks which heap-allocated 
memory may not be freed yet, while the analysis to detect UAF issues tracks 
which memory may potentially already be freed. One direction of improvement 
would be to consider tracking relational pointer information along the lines 
of Seidl et al. [18] and, additionally, consider relational information about the 
lengths of arrays and memory blocks. This may be useful in the case of vari- 
able length arrays and dynamically allocated memory for which the size is not 
statically known. 
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1.2 "Termination 


A termination analysis has been added, largely leveraging existing features of 
the framework. This highlights the versatility of the framework. To account for 
non-termination due to loops, a counter variable is inserted into each loop and 
incremented in every loop iteration. A relational polyhedra analysis based on 
APRON [8] is then used to determine whether the counter variable is bounded. 
'To detect potential non-termination due to recursion, the notion of a call graph is 
enhanced by considering functions together with their respective abstract calling 
contexts and taking dynamic calls via pointers into account. This graph is a 
posteriori extracted out of the analysis result and then checked for cycles in 
a post-processing phase. In case no cycles (including selfloops) exist in the 
abstract call graph, there can be no cycles in the concrete call graph. 

The currently implemented termination analysis is just a first step in the real- 
ization of related techniques. Future work may, e.g., be the tuning of the abstract 
contexts for this use-case, or the incorporation of more involved techniques for 
termination analysis by abstract interpretation [4, 5]. Extending the presented 
approaches to the non-termination of concurrent programs while remaining as 
thread-modular as possible seems particularly challenging. 


2 Software Architecture 


GOBLINT is implemented in ~54,000 lines of OCAML and uses an updated fork 
of CIL [12] as its parser frontend for the C language. It depends on APRON [8] 
for relational analyses. No other major libraries or external tools are required. 

The modular architecture of GOBLINT [1] allows a combination of analyses 
to be selected and automatically configured at runtime [15]. Analyses are defined 
through their abstract domains and transfer functions, which can communicate 
with other analyses using predefined queries and events. The combined analyses 
together with the control-flow graphs of the functions yield a side-effecting con- 
straint system [2], which is solved using a local generic solver [19]. The solution 
is post-processed to determine the verdict and construct a witness. 


3 Strengths and Weaknesses 


GOBLINT once again demonstrated its soundness in this year's competition, i.e., 
it did not produce any false negatives. The only other tools that did not produce 
any false negatives are AISE [21] (competing only in ReachSafety-Loops), BRICK 
(competing in three sub-categories of ReachSafety), and Mopsa [11] (competing 
in all categories except ConcurrencySafety and Termination). GOBLINT is thus 
the only sound toolin SV-COMP 2024 to support all properties, and the only 
sound tool represented in the overall ranking. Among the tools participating in 
the overall ranking, GOBLINT, despite targeting only proofs — which are tradi- 
tionally considered to be more time-consuming than finding counter-examples — 
leads the pack in terms of points achieved in < 9s. This is most pronounced when 
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considering runtimes < 1s. This highlights the efficiency of GOBLINT. Beyond 
these observations, we briefly discuss the newly added analyses here. Support for 
soundly detecting memory safety bugs greatly broadens the applicability of the 
analyzer, evidencing the flexibility of the underlying framework. Of particular 
note is the support for verifying the memory-safety of multi-threaded programs 
in a thread-modular way, yielding the second-best score in ConcurrencySafety- 
MemSafety, after DEAGLE [7]. Turning to termination analysis, the added anal- 
ysis demonstrates that a considerable chunk of the SV-COMP benchmarks in 
this category can be handled by using our extended dynamic call graph to deal 
with recursion and ghost counters together with numerical relational domains to 
deal with loops. Finally, GOBLINT now comes with dedicated support for ana- 
lyzing programs using set jmp/longjmp and flagging their misuse [16]. We have 
contributed programs using this language feature to the benchmark suite. 

A general weakness of GOBLINT currently is that, while it supports expensive 
but expressive relational domains such as polyhedra, it lacks a heuristic when 
to activate them, and thus only uses them for termination analysis. Activating 
these domains based on some program properties, or attempting analysis with 
such expensive domains after an analysis without them was inconclusive, may 
help to improve the precision of the analyzer without compromising its efficiency. 


4 Tool Setup and Configuration 


GOBLINT version svcomp24-0-gc2e9465a7 participated in SV-COMP 2024 [3, 
13]. It is available in both binary (Ubuntu 22.04) and source code form at our 
GitHub repository.* Instructions for building from source can be found in the 
README. Both the tool-info module and the benchmark definition for SV-COMP 
are named goblint. They correspond to running the tool as follows: 


./goblint --conf conf/svcomp24.json \ 
--set ana.specification property.prp input.c 


GOBLINT participated in all the categories, while opting-out from Falsifica- 
tionOverall. 


5 Software Project and Contributors 


GOBLINT development takes place on GitHub, while related publications are 
listed on its website.? It is an MIT-licensed project initiated by Technische Uni- 
versitát München and the University of Tartu. 
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Data Availability Statement. All data of SV-COMP 2024 are archived as described 
in the competition report [3] and available on the competition website. This includes 
the verification tasks, results, witnesses, scripts, and instructions for reproduction. The 
version of GOBLINT as used in the competition is archived on Zenodo [13]. 
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Abstract. We present advances we brought to Mopsa for SV-Comp 
2024. We significantly improved the precision of our verifier in the pres- 
ence of dynamic memory allocation, library calls such as memset, goto- 
based loops, and integer abstractions. We introduced a witness validator 
for correctness witnesses. Thanks to these improvements, Mopsa won SV- 
Comp’s SoftwareSystems category by a large margin, scoring 2.5 times 
more points than the silver medalist, Bubaak-SpLit. 


Keywords: Static Analysis - Abstract Interpretation - Competition on 
Software Verification - SV-Comp. 


1 Verification Approach: the Mopsa platform 


Mopsa is an open-source static analysis platform relying on abstract interpreta- 
tion [6]. The implementation of Mopsa aims at exploring new perspectives for 
the design of static analyzers. Journault et al. [8] describe the core Mopsa prin- 
ciples, and Monat [12, Chapter 3] provides an in-depth introduction to Mopsa’s 
design. The C analysis which we rely on for this competition is based on the work 
of Ouadjaout and Miné [16]; it proceeds by induction on the syntax, is fully 
context- and flow-sensitive, and committed to be sound. This is the second time 
Mopsa participates in SV-Comp [15]. We have brought precision improvements, 
described below; they have proved decisive for the SoftwareSystems category. 


Dynamic memory allocation precision improvements. Mopsa relies on 
the recency abstraction [1] to handle dynamic allocation. For each allocation site, 
this abstraction keeps the last allocated block separated from the others, the 
latter being summarized into a single, weak memory block. Allocation sites are 
customizable [14], they are usually based on a program location. However, this 
summarization can be detrimental to precision. We implemented an alternative 
abstraction that keeps memory blocks separated during loop unrolling. This 
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enhancement, combined with targeted loop unrolling helped us verify more tasks, 
including 246 from the uthash categories. Specifically, we proved correct half the 
tasks of uthash-NoOverflow category, which are out of reach of the other verifiers. 


Integer abstractions. Mopsa only supported convex representations of inte- 
ger sets, such as intervals. As such, it was impossible to precisely represent cases 
where x € [-10,10] and z Z 0. We have resolved this issue by adding an ex- 
cluded set domain, which tracks a set of values a given variable cannot take. We 
have also implemented the symbolic rewriting domain of Boillot and Feret [4], 
which simplifies arithmetic expressions with overflows into simpler ones. This 
new implementation has been written in 1,200 lines of OCaml code. 


Improved precision for goto-based loops. Since the analyzer iterates on the 
syntax of the program, goto statements require the usage of flows tokens [12] and 
a special fixpoint iteration scheme. We added support for a decreasing iteration 
pass, which allows to recover some precision after the generalization performed 
by the widening operator. In addition, we added a syntactical loop rewriting 
pass which turns few special goto patterns into equivalent while loops which are 
analyzed more precisely. 


Precise stub initialization. Ouadjaout and Miné [16] implemented a stub 
language and its interpretation for the C standard library in Mopsa. Contiguous 
region initialization through functions such as memset were not handled precisely 
by our implementation of the cells domain [11], mainly to be scalable. We im- 
proved the domain to handle region initialization up to a given bound, and NULL 
pointer synthesis from a contiguous block of 0 bytes. 


Other improvements. Some SV-Comp programs have specific symbolic ar- 
gument initialization performed by client code, with variable parameters on the 
maximal size of all symbolic arguments. We have thus extended Mopsa to handle 
a wide range of parameters for symbolic argument initialization, matching those 
found in SV-Comp programs. We also rely on the flambda optimizer for OCaml, 
which brings more than a 1596 performance improvements. 


2 Software Architecture: the SV-Comp driver 


By default, the C analysis of Mopsa detects all the runtime errors that may hap- 
pen in the analyzed program, while SV-Comp tasks focus on a specific property 
at a time. We thus rely on an SV-Comp specific driver. It takes as input the 
task description (program, property, data model). It runs increasingly precise C 
analyses defined in Mopsa until the property of interest is proved or the most 
precise analysis is reached (or the resources are exhausted). Each analysis result 
is postprocessed by the driver to check if the property is proved. 

An analysis configuration defines the set of domains used, and their parame- 
ters allowing modifications of the precision-efficiency ratio. A breakdown of the 
results is shown in Fig. 1. This year, we use five configurations. Conf. 1 relies on 
intervals and cells [11]. Conf. 2 additionally enables the string length domain [9], 
the excluded powerset domain, and congruences. It performs decreasing itera- 
tions for goto statements, unrolls the first 10 iterations of loops, enables the 
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Max. Conf. Tasks proved correct ‘Tasks yielding timeout 


1 6995 368 
2 7775 (4-780) 717 — (4349) 
3 8197 (2422) 2954 (+2237) 
4 8257 (+60) 3527 (+573) 
5 8400 (+143) 9532 (+6005) 


Fig. 1. Max. Conf. i represents the sequence of increasingly precise analyses from 
Conf. 1 up to Conf. i. Max. Conf. 2 is able to prove 780 tasks correct in addition to the 
6995 proved by conf. 1, although 717 tasks reach the resource limits when analyzed by 
Conf. 1 and 2 (349 more than by Conf. 1 alone). There are 25885 tasks in total, and 
17851 correctness tasks. Mopsa can only prove program correctness for now (6896 of 
the tasks); it yields *unknown" when unable to prove a program correct. 


enhanced memory allocation abstraction, and the more precise evaluation of 
stubs. Conf. 3 adds a polyhedra abstract domain, relying on a static packing 
to scale [7]. This includes tracking numerical relations between string lengths 
and scalar variables. A pointer sentinel domain is added to symbolically track 
the position of the first NULL value of a pointer array. Decreasing iterations are 
also enabled for/while loops, and the first 15 iterations of loops are unrolled. 
Conf. 4 adds the symbolic rewriting domain of Boillot and Feret [4]. Loop un- 
rolling is extended to 60 iterations. Conf. 5 performs a fully relational analysis 
of the analyzed program without packing. 


Witness Validation. We extended our driver to support the witness validation 
phase of SV-Comp: we inject loop invariants of a witness, encoded as assertions 
into the original program. We then check that this patched program is correct. 
This approach is similar to Metaval's [3], but we used the new YAML format. The 
work of Saan et al. [22] is more involved: it leverages the witness to guide their 
analysis and yields precision improvements, compared to their bare analysis. 


3 Strengths and Weaknesses 


Mopsa participated in the following categories, targeting C programs: Reach- 
Safety, MemSafety, NoOverflows and SoftwareSystems. It did not compete in 
the termination category and cannot precisely analyze concurrency-related veri- 
fication tasks. The highlight of this year's participation is Mopsa's gold medal in 
the SoftwareSystems track, focusing on verifying real software systems. Mopsa 
scored 2.5 times more points than the second tool, Bubaak-SpLit [5]. Figure 2 
breaks down the results of Mopsa in the subcategories of the SoftwareSystems 
track, highlighting our progress, and the best results obtained by this year's 
verifiers. An overview of results can be found in the competition report [2]. 


Strengths. Mopsa is quite scalable: our cheapest configuration is able to analyze 
a given program within the allocated resource budget in 98.696 of the cases. In 
addition, Mopsa is the only verifier of 2023 and 2024 able to gain points in the 
DDLL category, corresponding to large instances of instrumented Linux drivers. 
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Category Prop. |tasks| Mopsa'23 Mopsa'24 Best score (2024) 


AWS R 197 32 36 137 Symbiotic 

coreutils M 140 0 0 O _ 

coreutils N 30 0 4 4 Mopsa 

BusyBox N 54 4 8 8 Mopsa 

DDL R 2442 3174 3476 3476 Mopsa 

DDLL R 8 10 14 14 Mopsa 

DDL M 141 0 8 71 Bubaak-SpLit 

other R 22 0 10 10 Mopsa 

other M 34 0 12 12 Mopsa 

uthash R 138 0 192 228 Bubaak*, Symbiotic 
uthash M 138 0 96 204 Bubaak*, Symbiotic 
uthash N 114 0 204 204 Mopsa 


Fig. 2. Mopsa’s improvements for subcategories of the SoftwareSystems track. Property 
is either ReachSafety, MemSafety or NoOverflow. The last three columns show the 
score of Mopsa submitted last year, this year, and the best score reached by a verifier. 


Mopsa is committed to being sound. Thanks to this, we have been able to fix 20 
mislabeled verdicts this year, mainly in the DDL category ( DeviceDriversLinux). 


Weaknesses. Mopsa can only prove programs correct for now, and is currently 
unable to provide counterexamples otherwise. We plan to leverage the recent 
work of Milanese and Miné [10] to address this issue. Our SV-Comp driver cur- 
rently tries a fixed sequence of increasingly precise configurations. We plan to 
reuse information between the different analyses of the sequence, and automati- 
cally adapt the options of Mopsa to the analyzed program (similar to Goblint's 
autotuning [21]). Our analysis is not competitive enough in the tracks besides 
SoftwareSystems: we plan to add new array abstractions as well as a partitioning 
mechanism. We also noted that Mopsa is imprecise on longjmp, following the 
addition of recent benchmarks from Schwarz et al. [23] to SV-Comp. 


Methodology. We finish this section by explaining how we worked to im- 
prove Mopsa this year. We focused on the most important subcategories of Soft- 
wareSystems. We encountered a few runtime errors in our analysis: we used 
automated testcase reduction [18] to pinpoint these issues and fix them. We in- 
vestigated several timeouts in the DeviceDriversLinuz-Large (DDLL) category, 
by using standard profiling tools (such as perf), but also by profiling which 
parts of a given program took long to analyze through custom plugins. The rest 
of the work consisted in performing manual inspection of some tasks to see how 
we could improve precision. We started by choosing tasks solved by competing 
tools relying on similar approaches, starting from Goblint [20, 21, 19]. 


4 Software Project and Contributors 


Mopsa is available on Gitlab [17], and released under an GNU LGPL v3 license. 
Mopsa was originally developed at LIP6, Sorbonne Université following an ERC 
Consolidator Grant award to Antoine Miné. Mopsa is now additionally developed 
in other places, including Inria, ENS Airbus, and Nomadic Labs. The people who 
improved Mopsa for SV-Comp 2024 are the authors of this paper. 
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Data- Availability Statement. The exact version of Mopsa and the driver 
that participated in SV-Comp 2024 are available as a Zenodo archive [13]. 
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Abstract. PROTON is a tool to check whether a given C program has 
a non-terminating behaviour or not. It is built around the C Bounded 
Model Checker (CBMC). CBMC cannot prove non-termination directly, 
as all non-terminating runs are unbounded. PROTON annotates the 
loops in a given program with assertions that check for a recurrent pro- 
gram state. Violation of such an assertion shows the existence of a re- 
current state and thereby proves non-termination. PROTON also trans- 
forms the violating trace returned by CBMC into a non-termination 
witness for the program. 


1 Introduction 


Given a program P for which we want to check termination under all inputs, 
a checker should either provide a witness for non-termination of P, or give a correct 
verdict that P always terminates. For termination checking, PROTON reuses 
the high confidence, but unsound, technique used in VeriFuzz 1.4 [9]. For proving 
non-termination, PROTON implements a novel sound technique that attempts 
to discover recurrent states inside loops. A recurrent state (RS) is a program 
state at the head of a loop such that (1) RS entails the loop guard; (2) RS is 
reachable from an initial state in some valid program execution and (3) RS is 
reachable from itself after the loop body is executed. This notion of an RS is a 
strengthening of the recurrent set definition proposed in [5]. 

Consider the example program P in Listing [1.1] adapted from the SV-COMP 
benchmark WhileSingle.c. This program does not terminate for any nondet 
value < 3. For example, if nondet value on Line 1 is 3, then the if-condition 
on Line 3 gets evaluated to false and hence the value of i remains unchanged, 
causing the loop to run infinitely. PROTON works in three main phases, as 
described below. 
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Listing 1.1. Program P Listing 1.2. Program P’ 
ı i = | VERIFIER nondet int(); |; i = __VERIFIER_nondet_int(); 
2 while (i < 10) 1 2 bool pStoredO = false; 
if (à != 3) ( 3 while (i < 10) { 
i = i*1; 1 bool flag = __VERIFIER_nondet_bool(); 
5 } 5 static int oi; if(pStoredO) 
e } c {__CPROVER_assert(! (oi==i),"RSF");} 


7 if(flag){oi=i;pStored0=true;} 
s {if (i != 3) { i = i+1; }}} 


Phase 1 - Program Instrumentation: PROTON instruments each loop in 
P with a _CPROVER.assert to check for a recurrent state. This is illustrated 
in Listing [1.2] PROTON first parses P using various Clang/LLVM APIs and 
collects the set of all program variables visible in the scope of each loop Lk in 
P. Following this, PROTON instruments each L; as follows:- 


— A boolean variable pStoredk is introduced just before the loop-guard of Lj, 
and initialized to false (Line 2 of Listing |1.2].. 

— Another boolean variable flag is added inside the loop, immediately past the 
guard condition, which is nondeterministically initialized (Line 3). 

— For each variable i, visible in the scope of Lk, a corresponding static variable 
oi; i.e. i prefixed with o is added, which tracks the “old” value of i (Line 5). 

— An assertion that the “old” state of P never repeats in any later iteration of 
Ly (lines 5 and 6) is added. 

— If flag is true, then the program state is stored as shown on line 7, and 
pStoredk is set to true. 

— Lastly, PROTON emits the loop body as is, but enclosed in braces (Line 8). 


The above instrumentation ensures that the assertion gets checked (due to 
the if-condition on Line 5), in every iteration after the one in which the state 
is stored, as pStoredk is set to true after this if-statement. So, in the very first 
iteration in which the program state is stored, the assertion is not invoked. 
This encoding allows a bounded model checker like CBMC [BH] to check if the 
program state stored during a non-deterministically chosen iteration of Dy, recurs 
during any subsequent iteration of Lk, subject to the loop iteration bound used 
for checking. 


Phase 2 - Bounded Model Checking for recurrent states: After in- 
strumenting P, PROTON iteratively invokes CBMC for different unwind bounds 
until a pre-configured max unwind bound (empirically chosen to be 1000, for SV- 
COMP 2024) for a pre-configured time limit (set to 2 minutes for SV-COMP 
2024). If the recurrent state assertion ever gets violated, it proves the presence of 
a recurrent state and hence non-termination. When this happens, CBMC gen- 
erates a corresponding counterexample trace. During Phase 1 described above, 
PROTON does additional instrumentation (not shown in Listing [1.2] for want of 
space) to help generate a corresponding non-termination witness in the graphml 
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format. If the recurrent state assertion does not get violated until the max bound 
or if it times out, then PROTON moves to Phase 3, described below. 

Phase 3 - Value-Bounded Termination Check In this phase, entered 
only if PROTON could not find a non-termination witness in Phase 2, PROTON 
invokes the termination check of VeriFuzz 1.4 [o], which is reimplemented in 
PROTON, for a pre-configured time limit (set to 2 minutes for SV-COMP 2024). 


2 Software Architecture 


P P. 
ga mM Ye» 
NO 


YES 


Iterative RSA 
Check CBMC 


Generate 
Witness 


Report NT 


UNKNOWN or 
ERROR 


VeriFuzz 1.4 
Termination 
Check 


Fig. 1. PROTON architecture 


Currently PROTON checks only termination and non-termination of pro- 
grams. Figure |l| shows the tool flow of PROTON. Given an input program P, 
PROTON first invokes Bracer, which simply adds curly braces around all loop 
bodies in P to produce Py. PROTON then invokes Instrumenter on P,, which 
instruments P, as described in Phase 1, to produce P;. Sometimes, due to in- 
ternal errors, the Instrumenter may not be able to instrument the program. So, 
PROTON then checks if P; has at least one Recurrent State Assertion (RSA). 
If so, it performs the non-termination check as described in Phase 2 above and 
generates a corresponding witness if it detects non-termination. 

If P; does not contain any RSA, or if this non-termination check is unsuccess- 
ful, PROTON then invokes confidence based termination check on P, mentioned 
in Phase 3 above. If this termination check concludes that P terminates, PRO- 
'TON reports P to be terminating. Else, PROTON reports either UNKNOWN 
(when both checks failed) or ERROR (if there is any internal error). 

PROTON is built using CBMC v5.95.0 [8| with Z3 4.12.2 and Glucose 
Syrup [I] as the backend SMT and SAT solvers respectively. The Bracer and 
Instrumenter were implemented in C4-4- using the clang-14 and llvm-14 libraries. 
'The tool flow is implemented in a bash shell script. 
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3 Strengths and Weaknesses 


Here we present our analysis of strengths and weaknesses of PROTON’s non- 
termination check, as that is the main novelty of PROTON. 

Strengths: Of the 818 Non-termination tasks in SV-COMP 2024 [2], PROTON 
correctly solved 627, out of which 501 witnesses could be successfully validated. 
There are 18 tasks, all from systemc directory, such as token ring.10.cil-1.c 
and transmitter.08.cil.c, for which PROTON was the only tool in the com- 
petition that could identify them as non-terminating. These programs have sev- 
eral function calls and while loops, with around 1000 lines of code. However, 
none of the corresponding witnesses generated by PROTON could be validated. 
Further, the total time taken by PROTON for the 818 tasks is 37000 seconds, 
which is well below other top tools such as ULTIMATE Automizer [6] (correct 
solved: 548, confirmed: 537, time 100000 seconds) and 2LS [8] (correctly solved: 
685, confirmed: 484, time: 52000 seconds). This shows that PROTON’s approach 
of checking for recurrent sets at shallow loop unwinding depths is both effective 
and efficient. 

Weaknesses: As mentioned above in Phase 2 - Bounded Model Checking for 
recurrent states, PROTON checks for a recurrent state only up to an unwind 
to 1000 in SV-COMP 2024. Therefore, it cannot handle cases where recurrent- 
states occur beyond this unwind bound, such as in cohencui-both-nt.c, where 
the first recurrent state occurs after 2?? iterations. Another technical limita- 
tion of our approach is the inability to handle arrays, as it requires instru- 
menting each array element, which does not scale for large arrays. So, we cur- 
rently ignore loops that modify arrays, and hence could not solve cases such as 
Arrays02-EquivalentConstantIndices.c. Also, since our instrumenter does 
not handle recursion currently, PROTON could not identify benchmarks like 
RecursiveNonterminating-1.c as non-terminating. Lastly, due to a bug in the 
instrumenter, one pointer was nottracked by our insutrmenter, leading to PRO- 
'TON incorrectly reporting the program as non-terminating. 


4 Tool Configuration and Setup 


PROTON comes with an MIT license, and is available at [cii]. To install 
and run the tool, follow the instructions in the file named README.txt. 
'The benchexec tool-info module is PROTON.py and the benchmark definition 
file is PROTON.xml. A sample run command is: PROTON --graphml-witness 
witness.graphml --propertyfile termination.prp --64 example.c. 


PROTON opted to participate only in the Termination category in SV-COMP 2024. 


5 Software Project and Contributors 


PROTON is developed and maintained by the authors at IIT Delhi, TCS Re- 
search, and IIT Bombay. We thank everyone who has contributed to the devel- 
opment of PROTON, Clang and LLVM Infrastructure, CBMC, Glucose Syrup, 
and Z3. 
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Data-Availability Statement 


PROTON is publicly available at https://github.com/kumarmadhukar/term 


The SV-COMP 2024 competition version of PROTON is available at Zenodo: 


https://doi.org/10.5281/zenodo.10185252| For any queries, please contact the 


authors. 
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Abstract. SWAT is a novel dynamic symbolic execution engine for Java 
applications utilizing dynamic instrumentation. SWAT’s unique modular 
design facilitates flexible communication between its symbolic explorer 
and executor using HTTP endpoints, thus enhancing adaptability to di- 
verse application scenarios. The symbolic executor’s ability to attach to 
Java applications enables efficient constraint generation and path explo- 
ration. SWAT employs JavaSMT for constraint generation and ASM for 
bytecode instrumentation, ensuring robust performance. SWAT's efficacy 
is evaluated in the Java Track of SV-COMP 2024, achieving fourth place. 


Keywords: Dynamic Symbolic Execution - Java - Dynamic Instrumen- 
tation 


1 Verification Approach 


The symbolic execution of a System-under-Test (SuT) is a well-known verifica- 
tion technique where the state space is systematically explored by using con- 
straint modeling to compute new valid inputs for the SuT. Dynamic Symbolic 
Execution (DSE), in particular, has shown recent successes with JDart [15] win- 
ning the Java track of SV-COMP 2022 [4] as the first DSE tool and GDart 
achieving second place in 2023 [5]. Generally, DSE utilizes a symbolic executor 
to evaluate a SuT by observing the concrete execution for a given assignment 
of the symbolic variables. Constraints are recorded during execution, reflecting 
all operations involving symbolic variables. In particular, each branching point 
that depends on a symbolic variable is modeled as a path constraint. After the 
execution terminates, the symbolic explorer can select a previously unexplored 
branch. Given the recorded constraints, an SMT solver is used to determine 
whether a model for the symbolic variables under the given constraints exists. If 
so, a concrete instantiation for each value can be obtained to drive execution to 
previously unexplored regions of the state space. By repeating this process, the 
state space of the SuT can be systematically explored. 

JDart, the winning candidate from 2022, relies on Java Pathfinder (JPF) 
[9] and its implementation of the Java Virtual Machine (JVM) for symbolic 
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execution [15]. While the JPF-JVM offers robust analysis tools, it limits JDart’s 
applicability and causes a significant overhead. Coastal [8], on the other hand, 
relies on a standard JVM. The symbolic execution is realized using dynamic 
instrumentation. While Coastal provides a loosely coupled design between the 
symbolic execution engine and the symbolic explorer, both components are still 
located in the same Java program used as the driver to start and execute the 
SuT symbolically. GDart extends the notion of modularity introduced by Coastal 
with a fully decoupled explorer and executor that communicate using a custom 
protocol [16]. SWAT offers a fully modular design comparable to GDart while 
relying on HTTP endpoints for communication between the symbolic explorer 
and executor. In addition, GDart relies on the GraalVM for driving symbolic 
execution while SWAT attaches to the SuT, thus enabling symbolic execution 
inside native JVM implementations. 


2 System Architecture 


SWAT’s decoupled design allows for a persistent symbolic explorer that receives 
relevant information from instances of the symbolic executor. The executor ob- 
serves the SuT by attaching to the JVM and adding symbolic capabilities using 
dynamic instrumentation. An overview of the design and interaction between the 
different components is shown in Figure [Elana described in more detail below. 

Symbolic executor The executor attaches to the JVM running the SuT via 
the Java agent interface and dynamically instruments each class at load time with 
additional (non-interfering) instructions that dynamically build and manage a 
symbolic shadow state responsible for maintaining the symbolic constraints. This 
leads to a symbolic executor that does not actively drive symbolic execution and 
instead records relevant information during normal execution. SWAT utilizes the 
ASM framework [6] for bytecode manipulation via the Java.lang. instrument 
API [17]. Historically, this part builds on CATG as a basis for dynamic 
symbolic execution. Significant parts of CATG are reworked, and the language 
support is lifted to Java 17, including most of its features. The symbolic shadow 
state and the symbolic constraint handling are extended and wholly rewritten to 
utilize the API offered by JavaSMT [1| as an abstraction layer between constraint 
generation and the solver. The symbolic scope and variables, as well as the 
entry and exit points for symbolic tracking, are fully configurable, allowing for 
broad applicability of the system. The instrumentation logic is also modularized, 
allowing us to easily extend SWAT to various use cases, such as the SV-COMP. 

When the execution of the SuT reaches a symbolic entry point, the symbolic 
executor records control-flow information as well as the constraints, and after the 
exit point has been reached, both the trace and the corresponding constraints are 
sent to the symbolic explorer using HT'TP requests. Constraints are serialized 
using the SMT-LIB v2 [3| format. 

Symbolic explorer The explorer, written in Python using the Fast API 
web framework, receives the language agnostic trace and constraint information. 
These are stored in a binary execution tree. The tree can be searched using a 
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Fig. 1. Schematic overview of SWAT’s modular architecture. 


configurable and modularized strategy to select unexplored branches. To obtain 
new inputs, the constraints are sent to Z3 [I4]. The inputs can either be made 
available to external drivers, such as fuzzers, using an endpoint or, in the case of 
SV-COMP, are directly used to initiate a new concrete execution. This structure 
makes SWAT widely applicable and even enables straightforward testing of web 
services, for example, where each controller is configured as the entry and exit 
point and user-controlled values are tracked symbolically. This allows the same 
JVM to keep running in between symbolic runs and even allows for multiple 
(non-interfering) executions in parallel. 


3 Evaluation 


In the first participation on the Java category of SV-COMP 2024, SWAT reached 
fourth place with 566 out of 828 total points while MLB [7], the winning can- 
didate, scored 676 points. Overall, SWAT correctly classified 6896 of test cases. 
Figure visualizes the result distribution for test cases containing violations 
and those without. The number of correctly classified cases is similar for both 
groups. However, due to issues during witness generation, several correctly iden- 
tified violations did not produce correct witnesses. Hence, without considering 
the witnesses, the number of identified violations rises significantly from 6896 to 
8396. Generally, DSE frameworks are expected to identify violations (one con- 
crete path) better than proving their absence (full state-space exploration). This 
is also reflected in the distribution of timeouts, with a five times increase between 
violation and safe test cases. Roughly 1096 of test cases are labeled as unknown 
by SWAT. This case comprises several possibilities: Out-of-scope invocations 
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Fig. 2. SWAT results divided based on the ground truth of each test case (a) and 
results for each subcategory of the Java category (b). 


without a symbolic model, inability to determine satisfiability or unsupported 
behavior such as uncaught exceptions. 

Further dividing the results based on the different subgroups (see Figure 2b) 
highlights differences in the status distributions. SWAT generally performs well 
for regression test categories, as these usually test specific functionalities, re- 
sulting in small programs that do not lead to a state space explosion. With the 
increasing complexity of test suites, the number of timeouts is expected to rise. 
The Jayhorn recursive test cases cause many timeouts as SWAT currently does 
not support advanced recursion handling. Lastly, SWAT is holistically unable to 
solve the test cases provided by the Juliet test suite due to the extensive use of 
socket connections, which require explicit mocking. 

While the results demonstrate the impact of state space explosion on the 
performance of DSE engines, generally, the results highlight the potential of 
SWAT, especially when considering the overhead incurred by starting a new 
JVM instance for each run of the test case. In SWAT’s current form, this causes 
instrumentation at each iteration whereas test cases that can be re-initiated 
without restarting the JVM would result in significantly faster executions. 


4 Software Project 


SWAT is developed by the Institute for IT Security at the University of Lübeck 
and published on GitHub under the BSD 2-Clause. Installation instructions, 
documentation, and examples can be found on our GitHub Page [i1]. Global 
configuration options chosen for the participation include the exclusive usage 
of the Z3 [I4] solver, a breadth-first search strategy, and an SV-COMP specific 
driver modules inside the symbolic explorer and executor. 

Data-Availability Statement The version of SWAT used for the SV- 
COMP 2024 Java category is available at Zenodo [13] and on GitHub [EO]. 
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Abstract. SYMBIOTIC 10 brings four substantial improvements. First, 
we extended our clone of KLEE called JETKLEE with lazy memory ini- 
tialization. With this extension, JETKLEE can symbolically execute a 
function without knowing its context. In SV-COMP, we use it to han- 
dle extern variables. Second, we have implemented the technique called 
compact symbolic execution to SLOWBEAST. Third, we have implemented 
a non-trivial may-happen-in-parallel analysis, which improves slicing of 
parallel programs. Finally, we have implemented support for violation 
witnesses in the new witness format 2.0. 


1 Verification Approach 


Just like previous versions, SYMBIOTIC 10 relies on a combination of static anal- 
ysis, code instrumentation, and several flavors of symbolic execution (SE) [8]. It 
employs two symbolic executors: SLOWBEAST and our fork of KLEE [2] called 
JETKLEE. SLOWBEAST implements standard (forward) SE, backward SE with 
loop folding [5], and compact SE [13]. JETKLEE implements standard SE. 

The rest of the section describes the precise workflow for various types of 
properties and discusses the differences between SYMBIOTIC 10 and SYMBI- 
OTIC 9.1, which is the version that competed in SV-COMP 2023. 


Verification of the Property unreach-call For this property, SYMBIOTIC 10 
performs slicing of the given program to remove the parts that have no influence 
on reaching the target function, and executes sequential portfolio of the following 
engines. Each of the engines is executed for the given number of seconds. The 
execution can be shorter if the engine decides or fails to decide, e.g., due to an 
unsupported feature of the input program like threads or symbolic floats. 


1. Forward symbolic execution by JETKLEE for 333 seconds. JETKLEE is ef- 
ficient industrial-strength symbolic executor and most of the solved bench- 
marks are solved by JETKLEE. 
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2. Compact symbolic execution (CSE) [13] by SLOWBEAST for 60 seconds. In 
most cases, CSE either finishes quickly or brings no benefit compared to 
standard forward symbolic execution. 

3. Backward symbolic execution with loop folding (BSELF) [5] by SLowBEAsT 
without time limit. 

4. If BSELF fails, we perform forward symbolic execution by SLOWBEAST with- 
out time limit. The reason for this is that SLOWBEAST has better support 
for floating point arithmetic and threads than JETKLEE. 


If an error is found by any of the engines, it is replayed on the unsliced 
code. If the replay succeeds, we generate a violation witness. If the program 
is decided safe by BSELF, we generate a correctness witness containing the 
generated invariants. The other engines do not support invariant generation, 
therefore if the program is decided safe by any other of the engines, we generate 
a trivial correctness witness. 


Verification of Other Properties For other properties, SYMBIOTIC 10 uses 
the same workflow as SYMBIOTIC 9 [4]. In a nutshell, we identify program in- 
structions that can violate the property, instrument the program with code that 
dynamically checks the property violation before each of the identified instruc- 
tions, slice the program, and run either JETKLEE or SLOWBEAST. 


Compact Symbolic Execution We extended SLOWBEAST with compact sym- 
bolic execution (CSE) [13]. CSE analyzes each looping path of the execution and 
tries to summarize it by a quantified formula that describes the effect of « iter- 
ations of that cyclic path, where & is a free variable. For example, if we apply 
compact symbolic execution to the loop 


while (i < n) { if (A[i] = 0) { break; }; i += 2; }, 
the path condition will be augmented by the quantified formula 
K>0 A Vr. (0<7T<K > (¢+27 < nA Alit+ 27] £0)). 


This allows symbolic execution to fully explore some programs with unbounded 
loops and find deep counterexamples. However, it works only for looping paths 
of specific form and requires potentially expensive quantified SMT reasoning. 


Lazy Memory Initialization We extended JETKLEE with lazy memory ini- 
tialization, which constructs symbolic memory objects lazily during the first 
access to that object, not during its initialization. This allows isolated symbolic 
execution of functions without knowing their arguments and calling context. As 
all programs in SV-COMP start with the main function and there is no need to 
analyze an isolated function, we use this feature in the competition only to sup- 
port externally defined variables. Note that this cannot be achieved by merely 
making the externally defined variable symbolic, as it can be a pointer to exter- 
nal memory, which needs to be properly initialized. For this reason, externally 
defined variables were not supported by the previous version of SYMBIOTIC. 
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Table 1. The comparison of SYMBIOTIC 9.1 and SYMBIOTIC 10 on the intersection of 
benchmarks from SV-COMP 2023 and SV-COMP 2024. The table is computed from 
the official results of SV-COMP 2023 and SV-COMP 2024. 


Property Benchmarks Both solved Only 10 solved Only 9.1 solved 
no-data-race 783 0 0 0 
no-overflow 7502 442 4102 1 
termination 1809 1220 10 31 
unreach-call 9537 3577 116 225 
valid-memcleanup 61 35 0 0 
valid-memsafety 4113 416 1427 34 


May-Happen-in-Parallel Analysis We improved slicing of parallel programs 
by employing a static may-happen-in-parallel analysis [11], which overapproxi- 
mates the set of pairs of program locations that can happen in parallel in different 
threads. Previously, SYMBIOTIC assumed that all possible pairs of instructions 
can happen in parallel, which reduced effectivity of slicing. The implementation 
currently does not consider thread synchronization. For more details, see the 
bachelor’s thesis about the implementation [12]. In the future, we want to use 
this analysis also for proving some no-data-race properties. 


Other Changes All external dependencies of SYMBIOTIC 10 have been updated 
to newer versions and all parts of SYMBIOTIC 10 have been ported to LLvM 14. 
Notably, this concerns JETKLEE, into which we merged most of the upstream 
changes from the base KLEE (more than 300 commits). 

We extended JETKLEE with support for generating YAML-based violation 
witnesses in witness format 2.0°. SLOWBEAST still supports only the older wit- 
ness format 1.0 based on GraphML. 

We also fixed incorrect overflow checking of 64-bit integers and incorrect 
modeling of fscanf for the purposes of static analysis and instrumentation. Due 
to these problems, SYMBIOTIC 9.1 did not support any of *-Juliet benchmarks, 
which are now fully supported. 

Unlike the previous versions of SYMBIOTIC, SYMBIOTIC 10 does not employ 
PREDATOR [6] as a static analyzer. This is due to technical difficulties during 
porting our version of PREDATOR to LLVM 14. This is a temporary solution and 
we plan include PREDATOR in the future versions of SYMBIOTIC. 


2 Strengths and Weaknesses 


Standard forward symbolic execution suffers from path explosion and is unable 
to fully analyze programs with unbounded loops. Backward symbolic execution 
with loop folding and compact symbolic execution can finish analysis even for 
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some programs with unbounded loops, yet they still suffer from path explosion 
and will time out on programs with a large number of branching paths. 

The results of SV-COMP 2024 show that the combination of static analysis, 
instrumentation, program slicing, and several variants of symbolic execution are 
efficient in practice, in particular for bug hunting. The static analyses are often 
able to prove that some parts of the code are correct or do not influence the 
property. These parts of the code then can be removed by slicing. This partly 
mitigates the scalability problem caused by path explosion. 


Results of Symbiotic 10 in SV-COMP 2024 SYMBIOTIC 10 participated 
in all categories of SV-COMP 2024 for C programs. It won silver medals in 
categories MemSafety and FalsificationOverall [1]. SYMBIOTIC 10 produced 19 
wrong answers; most of these are caused by imprecise modeling of the system 
functions setlocale and getopt long. They are not fundamental problems of 
the approach and will be fixed. 

Table 1 compares the results of SYMBIOTIC 9.1 in SV-COMP 2023 and 
SYMBIOTIC 10 in SV-COMP 2024 on the benchmarks that were used in both 
years. SYMBIOTIC 10 was able to correctly solve 5655 benchmarks that were not 
solved by SYMBIOTIC 9.1. From these, 5366 benchmarks (3990 no-overflow + 
1376 valid-memsafety) are from subcategories *-Juliet, which the previous 
version of SYMBIOTIC did not support. Unfortunately, 147 of the previously de- 
cided benchmarks from ConcurrencySafety-main with property unreach-call 
were not decided by SYMBIOTIC 10 due to a bug in our version of SLOWBEAST. 
Additionally, 31 of previously decided benchmarks (16 in Memsafety-Heap and 
15 in Memsafety-LinkedLists) were not decided by SYMBIOTIC 10 due to ex- 
clusion of PREDATOR. If PREDATOR had not been excluded or the wrong results 
had been fixed, SYMBIOTIC 10 would have won the MemSafety category. 


3 Software Architecture, Usage, and Contributors 


All components of SYMBIOTIC 10 use LLvM 14 [9] for the intermediate repre- 
sentation. To obtain the LLVM bitcode from the verified C program, SYMBIOTIC 
relies on CLANG. Slicer and instrumentation module are written in C++ and rely 
on the library DG [3]. JETKLEE is implemented in C++ and SLOWBEAST [14] is 
written in Python. Both symbolic executors use Z3 [10] as the SMT solver. Con- 
trol scripts are written in Python. All the components and external dependencies 
have permissive open-source licenses. 

Binary form of SYMBIOTIC 10 is available Zenodo [7], source code is available 
from https://github.com/staticafi/symbiotic under the tag svcomp24. You can 
run SYMBIOTIC with 


bin/symbiotic --sv-comp --prp <prpfile> [--32] «source». 


For details, see the file README.md in the mentioned repository. 
SYMBIOTIC 10 has been developed at the Faculty of Informatics of Masaryk 
University by the authors of this paper under the supervision of Jan Strejéek. 
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Abstract. THETA is a model checking framework, with a strong empha- 
sis on effectively handling concurrency in software using abstraction re- 
finement algorithms. In SV-COMP 2024, we use 1) an abstraction-aware 
partial order reduction; 2) a dynamic statement reduction technique; 
and 3) enhanced support for call stacks to handle recursive programs. 
We integrate these techniques in an improved architecture with inherent 
support for portfolio-based verification using dynamic algorithm selec- 
tion, with a diverse selection of supported SMT solvers as well. In this 
paper we detail the advances of THETA regarding concurrent and recur- 
sive software support. 


Funding. This research was partially funded by the UNKP-23-{2,3}-I New National 
Excellence Program; Project no. 2019-1.3.1-KK-2019-00004 (implemented with the 
support provided from the NRDI Fund of Hungary under the 2019-1.3.1-KK fund- 
ing scheme); and the Doctoral Excellence Fellowship Programme (funded by the NRDI 
Fund of Hungary and the BME University). 


1 Verification Approach 


THETA [15[8| first competed at SV-COMP as a standalone tool in 2022, with 
initial support for some multi-threaded tasks using a crude version of a partial 
order reduction (POR) algorithm [2], and no practical support for recursion. 
This year, we implemented a novel abstraction-based partial order reduction 
algorithm [13] that enables THETA to solve significantly more tasks compared 
to previous SV-COMPs, especially in the ReachSafety category. Our algorithm 
considers two program statements independent even if they use the same shared 
variable when the current abstraction has no information about this variable. For 
example, the statements y = x and x = 1 are classically considered dependent 
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due to x. However, if the current abstraction has no information about «x (e.g., 
we only track the predicates y > 0 and z = y), we consider these statements 
independent as they are commutative in the abstract state space. We extend a 
static source-set based POR algorithm |1| with our abstraction-based technique. 

A novel statement reduction algorithm has also been developed for the ver- 
ification of concurrent programs [I4]. Our algorithm is similar to program slic- 
ing and cone-of-influence techniques in the sense that it detects and removes 
statements that do not affect the verified property BH]. However, our approach 
analyzes the current local states of concurrent threads and data-flow between 
threads to dynamically detect irrelevant statements that do not affect the verified 
property in the current thread interleaving. The evaluation of such statements is 
skipped which considerably reduces the time cost of successor state calculation 
during state space exploration. Our technique is especially useful for concurrent 
tasks where the reducing capability of existing slicing and cone-of-influence tech- 
niques is limited due to the many possible interleavings of threads: our algorithm 
can skip (sub-)statements in certain contexts even if these statements cannot be 
removed generally (that is, statements that may be important in other thread 
interleavings). Our algorithm is different from dynamic program slicing [9] since 
those techniques do not consider the current interleaving of threads for slicing. 

'THETA has been extended with enhanced interprocedural analysis [12]. Pre- 
viously, all procedures have been inlined at all of their calls before verification. 
Procedure support was implemented last year, which handles procedures dynam- 
ically during verification, using a stack to keep track of calling locations. This 
year, procedure support is further improved by applying abstraction to location 
stacks. If an abstract state overapproximates another with the bottom of their 
stacks abstracted away, then all abstract paths going out from the covered state 
are present at the covering state until the current procedure returns. Therefore, 
the top location of the covered state is popped and exploration continues from 
the outer procedure, eliding unnecessary exploration [12]. 

The main advantage of handling procedures dynamically is that it allows 
THETA to verify recursive programs, which was not possible with inlining. Ap- 
plying abstraction to stacks also enables the verification of some infinitely recur- 
sive programs. Additionally, it reduces the size of the abstract state-space and 
improves 'l'HETA's verification performance with predicate abstraction. 


2 Software Architecture 


Since last year, we opted to keep our initial portfolio-based approach B], but 
used a separate process for each configuration, which can easily be killed using 
signals, as opposed to the thread-based approach of THETA at SV-COMP’22. 
Furthermore, we created a generic interface that allows easy co-development of 
portfolios without having to recompile THETA. The architecture of THETA can 
be seen in Figure |1} THETA parses and transforms the input program into an 
eXtended CFA, then, based on the configuration in the portfolio, spawns one or 
more worker THETA processes that perform the verification. The portfolio en- 
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Fig. 1: The architecture of THETA for software verification 


gine has been re-written this year to better support pre-compiled configurations 
written in kotlin instead of kotlin scripts, due to uncovering the dire performance 
implications of using the script execution engine, which often takes multiple tens 
of seconds to initialize and start. Dynamic algorithm selection is used to select 
a suitable configuration for each input task, with several ways of recovering, 
should the first algorithm take too long or encounter an exception. 

THETA uses Z3 versions 4.12.2 and 4.5.0 (the latter is integrated natively 
via the Java API, while the former is used via SMT-LIB), MathSAT [|7| version 
5.6.10, CVC5 version 1.0.8 and Princess version 2023-06-19 as SMT 
solvers under the hood. Compared to previous years, THETA utilizes the new 
interpolation API of Z3 to support interpolation-dependent refinement strategies 
with the new solver (removed previously in 4.8.0). 

THETA has seen several major updates in its C-frontend for the new tasks 
introduced to the benchmark repository since SV-COMP’23. The most notable 
improvements were made around its ANTLR-based grammar for lexing and pars- 
ing C files, and some further tweaks in the transformation step from the AST to 
CFA to avoid some wrong verdicts that plagued THETA in earlier SV-COMPs. 


3 Strengths and Weaknesses of the Approach 


In ReachSafety, THETA achieved a score of 2119 [o]. Although T'HETA still has 
known limitations regarding some C elements (e.g., structs), recent technical 
improvements of the frontend resulted in 'THETA not giving any wrong results 
in any categories, except for 3 wrong results in ConcurrencySafety-NoOverflows. 
Furthermore, THETA achieved a score of 2354 in ConcurrencySafety. To show the 
negative influence of frontend limitations, we recalculated the score for the par- 
ticipating tools on those ConcurrencySafety tasks that did not end in a frontend 
failure for THETA. In this alternative scoring THETA would move from the 7th 
to the 3rd place, highlighting the serious need for further frontend development. 

It is worth looking at 'THETA's performance in the reachability category over 
the years. As seen DUE THETA has dipped in performance for last year's 
installment of SV-COMP (the figure shows only those tasks that have been the 
same for the last 3 years) from that of SV-COMP’22 b]. This year we managed 
to bring the performance back to even outperform THETA’22, especially in the 
ConcurrencySafety, Sequentialized and Combinations subcategory. However, we 
did lose a significant number of tasks in some other subcategories, such as Loops. 
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B Theta’22 m Theta’23 B Theta'24 


'Tasks solved 


Fig. 2: Overview of successful tasks for THETA per year on common tasks 


'This can either be a result of a suboptimal portfolio for such tasks, or the result 
of some tweaks we had to make in order to achieve this year's outstanding 0 
incorrect tasks, a feat performed only by 3 other tools. We plan to prioritize the 
analysis of these cases for future development. We also plan to support categories 
such as ProductLines and Heap, where we have almost no successful results. This 
entails supporting structs, function pointers, and heap manipulation. 

The novel algorithms implemented in THETA especially helped recursive and 
multithreaded programs. THETA gained support for recursive programs by imple- 
menting the aforementioned stack-based approach, and support for reachability 
queries in multithreaded programs grew more than 3.5-fold since last year, as 
seen in [Figure 2] In particular, our internal evaluation shows that the size of the 
state space reduced by the abstraction-based partial order reduction algorithm 
is 1596 smaller on average compared to the case when we use traditional partial 
order reduction. Our dynamic statement reduction technique can eliminate 22% 
of statements reducing the time of successor state calculation by up to 6096 and 
the overall verification time by 1596 on average depending on the configuration. 


4 Tool Setup and Configuration 


THETA is vastly configurable [8], and successfully choosing a performance config- 
uration for a verification task at hand can be complicated. For software verifica- 
tion, we recommend using the portfolio (complex) in the competition archive [3]: 
./theta-start.sh «input» --portfolio COMPLEX. To minimize the output 
verbosity and produce a witness, --loglevel RESULT and --witness-only can 
be added to the arguments. We also used these options at SV-COMP 2024. 


5 Software Project and Data Availability 


THETA is a verification framework maintained by the Critical Systems Research 
Group of the Budapest University of Technology and Economics. The project 
is available open-source on GitHu H1] under an Apache 2.0 license. The version 
(5.0.0) used in the competition is available at [8]. 


! \https://github.com/ftsrg/theta/releases/tag/svcomp24 
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Abstract. The verification of ULTIMATE AUTOMIZER works on an SMT- 
LIB-based model of a C program. If we choose an SMT-LIB theory of 
(mathematical) integers, the translation is not precise, because we over- 
approximate bitwise operations. In this paper we present a translation for 
bitwise operations that improves the precision of this overapproximation. 


1 Verification Approach 


ULTIMATE AUTOMIZER (in the following abbreviated as UAUTOMIZER) is a 
software verifier that implements the trace abstraction approach [6,9]. In trace 
abstraction, a verification problem is considered as a formal language and decom- 
posed via automata-theoretic methods into smaller verification problems. While 
verifying a C program, UAUTOMIZER applies trace abstraction to a model of 
the program that consists of a control-flow graph (CFG) and SMT-LIB formulas 
that express how the program’s data is modified while moving along an edge 
of the CFG. We obtain this model by first translating the C program into a 
Boogie [10] program and afterwards translating the Boogie program into the 
CFG and SMT-LIB formulas. We have two variants of these translations, we 
call them the integer-based translation and the bitvector-based translation. The 
integer-based translation results in a Boogie program over mathematical inte- 
gers that is later translated to SMT-LIB formulas from the integer theory. The 
bitvector-based translation results in a Boogie program over bitvectors that is 
later translated to SMT-LIB formulas from the bitvector theory. The integer- 
based translation uses modulo operations to make sure that the result of arith- 
metic operation is in the correct range. It also overapproximates the result of 
bitwise operations and is hence not very precise. If the trace abstraction-based 
verification algorithm returns a counterexample that contains an overapprox- 
imated operation, UAUTOMIZER does not return the counterexample but un- 
known instead. The bitvector-based translation returns a result that is precise 
but whose verification is costly. In order to mitigate the shortcomings of both 
translations, UAUTOMIZER first runs the verification on the integer-based model. 
If the result is unknown, the tool is run again on the bitvector-based model. 


* Jury Member: Matthias Heizmann 


© The Author(s) 2024 
B. Finkbeiner and L. Kovács (Eds.): TACAS 2024, LNCS 14572, pp. 418-423, 2024. 
https://doi.org/10.1007/978-3-031-57256-2 31 


Ultimate Automizer and the Abstraction of Bitwise Operations 419 


2 Abstraction of Bitwise Operations 


In the past our integer translation overapproximated the bitwise operators, i.e. &, 
|, 7^, 7, <<, >> returned some non-deterministic value. In this paper we show how 
to translate bitwise operators more precisely. Our translation is a generalization 
of the work of Liu et al. [11]. First we describe the translation of the operators 
&, |, ^. The remaining operators will be explained at the end of this section. For 
the operators &, |, ^ we distinguish three different cases: 


— If both operands are literals, we replace the operation by its result. 

— If one operand is a literal with a specific bit-pattern, we rewrite the expres- 
sion directly. 

— Otherwise we overapproximate with additional constraints for the return 
value. 


Rewrite rules. If one of the operands is a literal, we try to replace the bitwise 
operation by an arithmetic operation based on the bit-pattern of the literal. 
These rewrite rules are shown in Table 1 (omitting symmetric cases). The first 
two cases are simple. In the first row every bit is zero (i.e. the operand is 0). Zero 
is the absorbing element for & and the neutral element for | and ^. In the second 
row every bit is one (i.e. the operand is -1 for signed integers or the maximum 
value for unsigned integers). This is the neutral element for & and the absorbing 
element for |. The last two cases are motivated by typical bitmasks and are a 
generalization of the first two cases. In a C program, bitmasks are used to set 
bits to zero or to one. For example the expression x & 255 can be used to replace 
every bit of x by zero except for the last 8 bits. The third row is motivated by 
Liu et al. [11]. They rewrote x& 1 (i.e. only the last bit is one) to x%2, whereas 
we generalize this case for any pattern that only ends with ones. With the rule 
on the third row the expression x & 255 is rewritten to x%256. In the last row 
only the starting bits are one. This case works analogously to the third row, 
it is rewritten using a combination of modulo and other arithmetic operators. 
We implemented these rules in our translation from C to Boogie. Boogie has 
mathematical integer semantics, so the evaluation of the expressions in the table 
can never lead to an overflow. The rules for the operators | and ^ are based on 
the equalities a | b = a +b — (a & b) anda*b=a+b—2-(a& b). 


Constrained Overapproximation. If none of the operands are literals with a 
bit-pattern from above, we translate the bitwise operations to calls to functions 
as implemented in Fig. 1 in Boogie as follows: x&y is translated to and (x, y), 
x | y is translated to or(x, y) and x ^ y is translated to xor(x, y). We omitted 


Table 1: Rewrite rules based on the bit-pattern of c 


bits(c) | x&c | x|c | x^c 

0...0 [e x x 

1:51. x [o Cx 

0...01...1 xh%(c+1) x*c - x% (ct1) xtc - 2*(x%(c+1)) 
1...10...0 x - xA (c*1) c + x4A(c*1) c-x - 2* (x (c*1)) 
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procedure and(a: int, b: int) 1 procedure xor(a: int, b: int) f 
if (a == 0 || b == 0) return 0; if (a == 0) return b; 
if (a == b) return a; if (b == 0) return a; 
if (a == þ) return 0; 
yar r: int? 
assume (a>=0 || b<0) ==> r<=a; var r: int, 
assume (a<0 || b>=0) ==> r<=b; assume (a>=0 <==> b>=0) ==> r>0; 
assume (a>=0 || b>=0) ==> r>=0; assume !(a>=0 <==> b>=0) ==> r«0; 
assume (a«0 || b«0) ==> r>atb; assume (a>=0 || b>=0) ==> r<=atb; 
return T; return ©: 
5 } 


Fig. 1: Procedures to overapproximate the operators & and ^ 


the definition for the function or(a, b) here, because a possible implementation 
could simply use the relation between & and | to return a+b-and(a, b). The 
first lines of and and xor cover the cases that are handled precisely, i.e. where 
one of the operands is zero or both are equal. For all other cases return a non- 
deterministic value to overapproximate the behavior of the bitwise operators. 
We constrain this value via the assumptions that often provide lower and upper 
bounds. For example, if a and b are both non-negative, and(a, b) returns also 
a non-negative value that is also smaller or equal to both a and b. Similarly 
xor(a, b) returns a positive value that is smaller or equal to the sum a+b in 
that case. 


Negation and Shifts. We rewrite the negation ~x to the equivalent expression 
-1- x. We rewrite shift operators if the second operand is a literal. The left shift 
x << y is rewritten to x* c and the right shift x >> y is rewritten to x / c, where c 
is the literal that is obtained by evaluating pow(2, y). The rewritten expression 
x*c has an overflow if and only if the original expression x << y has an overflow. 


3 Strengths and Weaknesses 


UAUTOMIZER won the overall category and the category NoOverflows in SV- 
COMP 2024 [2]. UAUTOMIZER reported 10 incorrect results, which were due to 
incorrect modelling of C features. 

We evaluated the abstraction of bitwise operations on selected benchmarks 
from SV-COMP 2024. The evaluation was performed on a AMD Ryzen Thread- 
ripper 3970X using 2 cores at 3.7 GHz with a time limit of 900s and a memory 
limit of 8 GB. In Table 2 you can see the results of the evaluation on the cat- 
egory ReachSafety. We choose this category, because it contains a wide range 
of benchmarks, including several that make use of bitwise operators. There we 
compared three settings: the bitvector-based translation, the old integer-based 
translation where every bitwise operation is allowed to return any value and the 
integer-based translation with the optimizations described in Section 2. The re- 
sults show that the new integer-based translation can verify 25 more benchmarks 
than the old integer-based translation (from various folders, e.g. hardness-nfm22 
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Table 2: Comparison on ReachSafety 


Bitvector Integer (optimized) Integer (old) 
time mem time mem time mem 
# (b) (GB| # (bh) (GB| # (h) (GB) 
total (10 205) 1958 65 1862 | 2076 37 2600 | 2051 36 2550 
safe (7557) 1183 41 1030 | 1350 22 1510 | 1324 21 1440 
unsafe (2648) | 775 24 832| 726 15 1090| 727 15 1110 


Table 3: Comparison on Termination-Bit Vectors 


Integer (optimized) Integer (old) 
time mem time mem 
# / () (GB|£  () (GB) 
total (37) 31 410 12.1 12 122 4.2 
safe (23) 23 325 92 7 73 2.5 
unsafe (14) 8 85 2.9 5 49 1.7 


and hardware-verification) and 118 more than the bitvector-based translation. 
The bitvector-based translation is precise in contrast to the integer-based trans- 
lation. Overall this precision does not pay off, as the result of the bitvector-based 
translation is often too costly to verify. However, the precision can also be help- 
ful, as the bitvector-based translation can find 48 (resp. 49) more bugs than the 
integer-based translations. 

We also evaluated our approach on the subcategory Termination-Bit Vectors, 
where most of the benchmarks contain bitwise operations. For termination we 
do not support bitvectors, therefore we compared only our approach with the 
old integer-based translation. The results in Table 3 show that the our optimized 
approach is sufficient to prove the (non-)termination of 31 of the total 37 tasks, 
whereas the trivial overapproximation is only sufficient for 12. 


4 Architecture, Setup, Configuration, and Project 


UAUTOMIZER is part of ULTIMATE [15,16], a program analysis framework writ- 
ten in Java and licensed under LGPLv3. UAUTOMIZER is an automaton-based 
model checker using a CEGAR-loop approach [8]. The submitted version 0.2.4- 
0e0057cc requires Java 11 and Python 3.6. Its Linux version, binaries of the 
required SMT solvers Z3 [12,13], CVC4 [1,14], MathSAT [4,7], and a Python 
wrapper script were submitted as a .zip archive. UAUTOMIZER is invoked with 


./Ultimate.py --spec <p> --file <f> --architecture <a> --full-output 


where «p» is an SV-COMP property file, «£» is an input C file, «a» is the archi- 
tecture (32bit or 64bit), and --full-output enables verbose output to stdout. A 
witness is written to the files witness. graphm1 and witness. yml. The benchmark- 
ing tool BENCHEXEC [3] supports UAUTOMIZER through the tool-info module 
ultimateautomizer.py. UAUTOMIZER participates in all categories, as declared 
in its benchmark definition file uautomizer.xnml. 
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Data Availability. The competition contribution for UAUTOMIZER is available 
as an archive on Zenodo [5]. 
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